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Abstract 



We investigate the stability of a Sequential Monte Carlo (SMC) method applied to the problem of sampling 
from a target distribution on R d for large d. It is well known [9j 1141 156| that using a single importance sampling 
step one produces an approximation for the target that deteriorates as the dimension d increases, unless the 
number of Monte Carlo samples N increases at an exponential rate in d. We show that this degeneracy can be 
avoided by introducing a sequence of artificial targets, starting from a 'simple' density and moving to the one of 
interest, using an SMC method to sample from the sequence (see e.g. [201 1271 \3E\ \4S\ ). Using this class of SMC 
QO 1 methods with a fixed number of samples, one can produce an approximation for which the effective sample size 

(ESS) converges to a random variable en as d — ► oo with 1 < en < N. The convergence is achieved with a 
computational cost proportional to Nd 2 . If en <C N, we can raise its value by introducing a number of resampling 
steps, say m (where m is independent of d). In this case, ESS converges to a random variable ejv,m as d — > oo 
and lim m _ ) . 00 ejv,m = N. Also, we show that the Monte Carlo error for estimating a fixed dimensional marginal 
. expectation is of order -4= uniformly in d. The results imply that, in high dimensions, SMC algorithms can 

efficiently control the variability of the importance sampling weights and estimate fixed dimensional marginals at 
a cost which is less than exponential in d and indicate that, in high dimensions, resampling leads to a reduction 
(y>j ■ in the Monte Carlo error and increase in the ESS. 
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^ '. 1 Introduction 

£T) , Sequential Monte Carlo (SMC) methods can be described as a collection of techniques that approximate a sequence 
■ of distributions, known up-to a normalizing constant, of increasing dimension. Typically, the complexity of these 
\ distributions is such that one cannot rely upon standard simulation approaches. SMC methods are applied in 
a wide variety of applications, including engineering, economics and biology, see |33| and Chapter VIII in |24) 
for an overview. They combine importance sampling and resampling to approximate distributions. The idea is to 
introduce a sequence of proposal densities and sequentially simulate a collection of N > 1 samples, termed particles, 
• '"J \ in parallel from these proposals. In most scenarios it is not possible to use the distribution of interest as a proposal. 
r> ■ Therefore, one must correct for the discrepancy between proposal and target via importance weights. In almost all 
cases of practical interest, the variance of these importance weights increases with algorithmic time (e.g. |41|): this 
can, to some extent, be dealt with via resampling. This consists of sampling with replacement from the current 
samples using the weights and resetting them to 1/N. The variability of the weights is often measured by the 
effective sample size (|44|) and one often resamples when this drops below a threshold (dynamic-resampling). 

There are a wide variety of convergence results for SMC methods, most of them concerned with the accuracy of 
the particle approximation of the distribution of interest as a function of N. A less familiar context, related with 
this paper, arises in the case when the difference in the dimension of the consecutive densities becomes large. Whilst 
in filtering there are several studies on the stability of SMC as the time step grows (see e.g. [2T1 l2l)l 1501 |3"T1 131)1 142| ) 
they do not consider this latter scenario. In addition, there is a vast literature on the performance of high- 
dimensional Markov chain Monte Carlo (MCMC) algorithms e.g. [Ill [5T1 [52] ; our aim is to obtain a similar analytical 
understanding about the effect of dimension on SMC methods. The articles [6j [9l HU [56] have considered some 
problems in this direction. In [9] [14j [56] the authors show that, for an i.i.d. target, as the dimension of the state 
grows to infinity then one requires, for some stability properties, a number of particles which grows exponentially in 
dimension (or 'effective dimension' in |56|); the algorithm considered is standard importance sampling. We discuss 
these results below. 
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1.1 Contribution of the Article 



We investigate the stability of an SMC algorithm in high dimensions used to produce a sample from a sequence 
of probabilities on a common state-space. This problem arises in a wide variety of applications including many 
encountered in Bayesian statistics. For some Bayesian problems the posterior density can be very 'complex', that is, 
multi-modal and/or with high correlations between certain variables in the target ('static' inference, see e.g. |40]). 
A commonly used idea is to introduce a simple distribution, which is more straightforward to sample from, and to 
interpolate between this distribution and the actual posterior by introducing intermediate distributions from which 
one samples sequentially. Whilst this problem departs from the standard ones in the SMC literature, it is possible 
to construct SMC methods to approximate this sequence; see |20[ 127] 138] [48] . The methodology investigated here 
is applied in many practical contexts: financial modelling [39], regression |54| and approximate Bayesian inference 
|28| . In addition, high-dimensional problems are of practical importance and normally more challenging than their 
low dimensional counterparts. The question we look at is whether such algorithms, as the dimension d of the 
distributions increases, are stable in any sense. That is, whilst d is fixed in practice, we would like identify the 
computational cost of the algorithm for large d, to ensure that the algorithm is stable. Within the SMC context 
described here, we quote the following statement made in |14| : 

'Unfortunately, for truly high dimensional systems, we conjecture that the number of intermediate steps 
would be prohibitively large and render it practically infeasible.' 

One of the objectives of this article is to investigate the above statement from a theoretical perspective. In the 
sequel we show that for a certain class of target densities: 

• The SMC algorithm analyzed, with computational cost 0(Nd 2 ) is stable. Analytically, we prove that ESS 
converges weakly to a non-trivial random variable ejy as d grows and the number of particles is kept fixed. 
In addition, we show that the Monte Carlo error of the estimation of fixed dimensional marginals, for a fixed 
number of particles N is of order 1/y/N uniformly in d. The algorithm can include dynamic resampling at 
some particular deterministic times. In this case, the algorithm will resample 0(1) times. Our results indicate 
that estimates will improve when one resamples. 

• The dynamically resampling SMC algorithm (with stochastic times and some minor modifications) will, with 
probability greater than or equal to 1 — Mj y/N, where M is a positive constant independent of N, also exhibit 
these properties. 

• Our results are proved for O(d) steps in the algorithm. If one takes 0{d 1+s ) steps with any 8 > 0, then ESS 
converges in probability to N and the Monte Carlo error is the same as with i.i.d. sampling. If — 1 < 6 < 
then ESS will go-to zero (Corollary 16. ip . That is, 0{d) steps are a critical order for the stability of the 
algorithm in our scenario. 

Our results show that in high-dimensional problems, one is able to control the variability of the weights; this is 
a minimum requirement for applying the algorithm. They also establish that one can estimate fixed dimensional 
marginals even as the dimension d increases. The results help to answer the point of [13] quoted above. In the 
presence of a quadratic cost and increasingly sophisticated hardware (e.g. [43]) SMC methods are in fact applicable, 
in the static context, in high-dimensions. To support this, |39| presents further empirical evidence of the results 
presented here. In particular, it is shown there that SMC techniques are algorithmically stable for models of 
dimension over 1000 with computer simulations that run in just over 1 hour. Hence the SMC techniques analyzed 
here can certainly be used for high-dimensional static problems. The analysis of such methods for time-dependent 
applications (e.g. filtering) is subject to further research. 

When there is no resampling, the proofs of our results rely on martingale array techniques. To show that the 
algorithm is stable we establish a functional central limit theorem (fCLT) under easily verifiable conditions, for a 
triangular array of non-homogeneous Markov chains. This allows one to establish the convergence in distribution 
of ESS (as d increases) . The result also demonstrates the dependence of the algorithm on a mixture of asymptotic 
variances (in the Markov chain CLT) of the non-homogeneous kernels. 

1.2 Structure of the Article 

In Section[2]we discuss the SMC algorithm of interest and the class of target distributions we consider. In Section[3] 
we show that ESS converges in distribution to a non-trivial random variable as d — > oo when the algorithm does 
not resample. We also show that the Monte Carlo error of the estimation of fixed dimensional marginals, for a fixed 
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number of particles N, has an upper bound of the form M/y/N, where M is independent of d. We address the issue 
of resampling in Section 31 where it is shown that as d — > oo any dynamically resampling SMC algorithm, using the 
deterministic ESS (the expected ESS with one particle) will resample 0(1) times and also exhibit convergence of 
the ESS and Monte Carlo error. In addition, any dynamically resampling SMC algorithm, using the empirical ESS 
(with some modification) will, with high probability, display the same convergence of the ESS and Monte Carlo 
error. In Section [5] we verify the involved assumptions for a particular example. Finally, we conclude in Section 
[5] with some remarks on O(d) steps being a critical order and ideas for future work. Proofs are collected in the 
Appendix. 



1.3 Notation 

Let (E,d?) be a measurable space and 3^(E) be the set of probability measures on (E,$). For a given function 
V : E i-> [1, oo) we denote by JSfy the class of functions / : E i-> K for which 

\f\v ■= sup — — — < +oo . 

xe E V(x) 

For two Markov kernels, P and Q on (E,<o), we define the F-norm: 

su P]fl<v \P(f)(x)-Q(f)(x)\ 
P-Q v ■■= sup !il= -— , 

xeE V{x) 

with P(f)(x) := J E P{x,dy)f(y). The notation 

\\P(x,-)-Q(x,-)\\ v := sup \P(f)(x) - Q(f)(x)\ 
\f\<v 

is also used. For fi £ 3P(E) and P a Markov kernel on (E,<o), we adopt the notation fiP(f) := f E fj,(dx)P(f)(x). 
In addition, P n (f)(x) := f En -i P{x,dx±)P(x\,dx 2 ) x ••■ x P(/)(x n _i). ^(M) is used to denote the class of Borel 
sets and Cf>(R) the class of bounded continuous ^(Immeasurable functions. Denote ||/||oo = sup^jj |/(x)|. We 
will also define the L e -norm, \\X\\ e = ¥}^ e \X\ e , for g > 1 and denote by L e the space of random variables such 
that ||X|| fi < oo. For d > 1, J\fd.(fJ>, S) denotes the d-dimensional normal distribution with mean fi and covariance 
E; when d = 1 the subscript is dropped. For any vector (x\, . . . , x p ), we denote by x q -, s the vector (x 9 , . . . , x s ) for 
any 1 < 9 < s < p. Throughout M is used to denote a constant whose meaning may change, depending upon the 
context; any (important) dependencies are written as M(-). 



2 Sequential Monte Carlo 

We wish to sample from a target distribution with density II on M d with respect to Lebesgue measure, known up to 
a normalizing constant. We introduce a sequence of 'bridging' densities which start from an easy to sample target 
and evolve toward II. In particular, we will consider (e.g. \27\): 

ILn(x) oc YL(x)' t>n , i6l J , (1) 

for < (f>o < ■ ■ ■ < (f>„—i < (j) n < • • • < 4>p = 1. The effect of exponentiating with the small constant 0o is that 
n(x)*° is much 'flatter' than II. Other choices of bridging densities are possible and are discussed in the sequel. 

One can sample from the sequence of densities using an SMC sampler, which is, essentially, a Sequential Impor- 
tance Resampling (SIR) algorithm or particle filter that targets the sequence of densities: 

n-l 

U n (xi:n) = n„(l„) Y\_ L l{ X j+li X j) 
3=1 

with domain (M 11 )™ of dimension that increases with n = 1, . . . ,p; here, {L„} is a sequence of artificial backward 
Markov kernels that can, in principle, be arbitrarily selected. The work in [37] motivates the selection of {L n } and 
characterizes the optimal kernel, in terms of minimizing the variance of the importance weights for SMC. Let {K n } 
be a sequence of Markov kernels of invariant density {!!„} and T a distribution; assuming the weights appearing 
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0. Sample Xq, . . . X^ i.i.d. from T and compute the weights for each particle i £ {1, . . . , N}: 

n (4) 



w K) 



T(4) 



5e£ n = 1 and Z = 0. 
1. If n < p, for each i sample X l n \ x l n _ 1 from K n and calculate the weights 

w n(^:n-l) = T\ 7~i 7 w n- 1 (4n-2 ) 

Ll n -X{X n _ 1 ) 

with the convention Xq._ 1 = x l . Calculate the Effective Sample Size (ESS): 



ESS (l , n) (N) := ^ ±- . (2) 



If ESS M (N) < a: 

resample x\, . . . x^ according to their normalised weights 



N 

Wnixl^/^Wnixl^) ) (3) 
3=1 



set I = n; 

re-initialise the weights by setting w; T[ (a:]. ra _ 1 ) = 1, 1 < i < N ; 
let 4: • • • 3^ no ' u; denote the resampled particles. 
Set n = n + 1. 

Return to the start of Step 1. 



Figure 1: The SMC algorithm analyzed in this article. 

below are well-defined Radon Nikodym derivatives, the SMC algorithm we will ultimately explore is the one defined 
in Figure [1] It arises when the backward Markov kernels L n are chosen as follows: 

T , _ Tl n +i{x')K n +i(x',x) 

L n [X,X ) — — — ■ . 

n„+i(x) 

With no resampling, the algorithm coincides with the annealed importance sampling in [48] . For simplicity, we will 
henceforth assume that T = Ho. It is remarked that, due to the results of [9l HH [56], it appears that the cost of 
the population Monte Carlo method of j!8j would increase exponentially with the dimension; instead we will show 
that the 'bridging' SMC sampler framework above will be of smaller cost. 

ESS defined in ([2]) is typically used to quantify the quality of SMC approximations associated to systems of 
weighted particles. It is a number between 1 and TV, and in general the larger the value, the better the approximation. 
Resampling is often performed when ESS falls below some pre-specified threshold such as a — N/2. The operation 
of resampling consists of sampling with replacement from the current set of particles via the normalized weights in 
([3]) and resetting the (unnormalized) weights to 1. There is a wide variety of resampling techniques and we refer 
the reader to |33| for details; in this article we only consider the multinomial method just described above. 

2.1 Framework 

We will investigate the stability of the SMC algorithm in Figure Q] as d — > oo. To obtain analytical results we will 
need to simplify the structure of the algorithm (similarly to MCMC results in high dimensions in e.g. [71 [Tl"ll5"Tll52| ). 
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In particular, we will consider an i.i.d. target: 

d 

TL{x) — Y[ ""(^j) I 7r ( x j) = cx P{ff( x j)} i Xj- G M , (4) 

for some 5 : M i-> K. In such a case all bridging densities are also i.i.d.: 

</ 

II„(a;) oc n n (xj) ; 7T„ («.,■) cx exp{0„ g^)} ■ 
j'=i 

It is remarked that this assumption is made for mathematical convenience (clearly, in an i.i.d. context one could 
use standard sampling schemes). Still, such a context allows for a rigorous mathematical treatment; at the same 
time (and similarly to corresponding extensions of results for MCMC algorithms in high dimensions) one would 
expect that the analysis we develop in this paper for i.i.d. targets will also be relevant in practice for more general 
scenarios; see Section [6] for some discussion. A further assumption that will facilitate the mathematical analysis is 
to apply independent kernels along the different co-ordinates. That is, we will assume: 

d 

K n (x,dx') = Y[ k n (Xj,dx'j) , (5) 
i=i 

where each transition kernel fc„(-, •) preserves ir n (x); that is, iT n k n = 7r„. Clearly, this also implies that Tl n K n = II„ . 

The stability of ESS will be investigated as d — > 00: first without resampling and then with resampling. We 
study the case when one selects cooling constants <j) n and p as below: 

p = d: 4> n {= (j) n d ) = 0o + — ^ , < n < d , (6) 

a 

with < 4>q < 1 given and fixed with respect to d. It will be shown that such a selection will indeed provide 
a 'stable' SMC algorithm as d — > 00. Note that </>o > as we will be concerned with probability densities on 
non-compact spaces. 

Remark 2.1. Since {</>«} will change with d, all elements of our SMC algorithm will also depend on d. We use the 
double-subscripted notation k n ^, Tt n ,d when needed to emphasize the dependence ofk n and 7r n on d, which ultimately, 
depend on n, d through <f> n ,d- Similarly, we will sometimes write X n (d), or x n {d), for the Markov chain involved in 
the specification of the SMC algorithm. 

Remark 2.2. Although the algorithm runs in discrete time, it will be convenient for the presentation of our results 
that we consider the successive steps of the algorithm as placed on the continuous time interval [4>o, 1], incremented 
by the annealing discrepancy (1 — (f>o)/d. We will use the mapping 

to switch between continuous time and discrete time. Related to the above, it will be convenient to consider the 
continuum of invariant densities and kernels on the whole of the time interval [4>o,l]. So, we will set: 

ir s {x) oc 7r(a;) s = exp{s#(x)} , s G [fa, 1] ■ 

That is, we will use the convention n n = ir ( f >n with the subscript on the left running on the set {1, 2, . . . , d}. 
Accordingly, k s (-, ■), with s £ (<pQ, 1], will denote the transition kernel preserving tt s . 

2.2 Conditions 

We state the conditions under which we derive our results. Throughout, we set k^ = 7T^ and (E, S) — (ffi, J?(R)). 
We assume that g(-) is an upper bounded function. In addition, we make the following assumptions for the 
continuum of kernels/densities: 

(Al) Stability of{k s }. 

(i) (One-Step Minorization) . We assume that there exists a set C £ £ , a constant 6 (0,1) and some 
v G ,^(E) such that for each s £ (<f>o, 1] the set C is (1, 9, v)— small with respect to k s . 
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(ii) (One-step Drift Condition). There exists V : E i— > [l,oo) with limixi-^oo V(x) = oo, constants A < 1, 
6 < oo, and C £ <? as specified in (i) such that for any x £ E and s € ($>o, 1]: 

k s V(x) < XV (x) + blc(x) . 

In addition ^^(V) < oo. 

(iii) (Level Sets). Define C c — {x : V(a;) < c} with 1/ as in (JTTJ) . Then there exists a c £ (1, oo) such that 
for every s € (</>o,l), C c is a (1, 0, v)— small set with respect to k s . In addition, condition (ii) holds for 
C = C c , and A, b (possibly depending on c) such that A + 6/(1 + c) < 1. 

(A2) Perturbations of {k s }. 

There exists an M < oo such that for any s, t € (4>o, 1] 

IPs - h\\\v <M\s-t\ . 



The statement that C is (1, 8, v)— small w.r.t. to k s means that C is an one-step small set for the Markov kernel, 
with minorizing distribution v and parameter 8 £ (0, 1) (see e.g. |47|). 

Assumptions like (AHJ are fairly standard in the literature on adaptive MCMC (e.g. [T]). Note though that the 
context in this paper is different. For adaptive MCMC one typically has that the kernels will eventually converge 
to some limiting kernel. Conversely, in our set-up, the d bridges (resp. kernels) in between ttq (resp. fco) and ltd 
(resp. kj) will effectively make up a continuum of densities tt s (resp. kernels k s ), with s <= [</> , 1], as d grows to 
infinity. The second assumption above differs from standard adaptive MCMC but will be verifiable in real contexts. 
Note that one could maybe relax our assumptions to, e.g. sub-geometric ergodicity versus geometric ergodicity, at 
the cost of an increased level of complexity in the proofs. It is also remarked that the assumption that g is upper 
bounded is only used in Section |4l when controlling the resampling times. The assumptions adopted in this article 
are certainly not weak, but still are very close to the weakest assumptions adopted in state-of-the-art research on 
stability of SMC, see |57l EH [59] . 

3 The Algorithm Without Resampling 

We will now consider the case when we omit the resampling steps in the specification of our SMC algorithm in 
Figure [TJ Critically, due to the i.i.d. structure of the bridging densities Ii n and the kernels K n each particle will 
evolve according to a rf-dimensional Markov chain X n made up of d i.i.d. one-dimensional Markov chains {X n j}^ =0 , 
with j the co-ordinate index, evolving under the kernel k n . Also, all particles move independently. 
We consider first the stability of the terminal ESS, i.e., 

fell ™<i(4:d-l)) 

ESS (M) (iV) = ^ -f (8) 

where, due to the i.i.d. structure and our selection of </> n 's in ([S]), we can rewrite: 

w d (x 0:d - 1 ) = exp | - ^ ^2 ^ 9( x n-i,j)\ ■ (9) 

^ 3 -=l n =l ' 

It will be shown that under our set-up ESS( O;( z)(-^0 converges in distribution to a non-trivial variable and analytically 
characterise the limit; in particular we will have linid-^oo E [ESS(o,d)(iV) ] G (1, N). 

3.1 Strategy of the Proof 

To demonstrate that the selection of the cooling sequence <f> n in (j6|) will control ESS we look at the behaviour of 
the sum: 

(io) 

j=l n=l 
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appearing in the expression for the weights, Wd(xo-.d-i)i hi ©■ Due to the nature of the expression for ESS one can 
re-center, so we can consider the limiting properties of: 



1 d - 

differing from JTU]) only in terms of a constant (the same for all particles), where we have defined: 

w j (d) = w j (d)-nw j (d)} (12) 

and 



1 — 00 

V n— 1 



(13) 



As mentioned above, the dynamics of the involved random variables correspond to those of d independent scalar 
non-homogeneous Markov chains {X n j}f l=0 = {X n j(d)}f l=0 of initial position Xqj ~ ttq and evolution according 
to the transition kernels {k n }i< n <:d- We will proceed as follows. For any fixed d and co-ordinate j, {X n j}^ =0 
is a non-homogeneous Markov chain of total length d + 1. Hence, for fixed j, {X n ,j}d. n constitutes an array of 
non-homogeneous Markov chains. We will thus be using the relevant theory to prove a central limit theorem (via a 
fCLT) for Wj(d) as d — > oo. Then, the independency of the Wj(d)'s over j will essentially provide a central limit 
theorem for a(d) as d — > oo. 

3.2 Results and Remarks for ESS 

Let t g [0o, 1] and recall the definition of ld(t) in (J7J. We define: 



St = —FT ^2i9{X n -i,j) - 7r n _i(g)} . 

V n— 1 

Note that 5*1 = Wj(d). Our fCLT considers the continuous linear interpolation: 

s d (t) = S t +(d t Y -^- - l d {t)j [S t+ - S t ] , 

where we have denoted 

i d (t)+i 



Theorem 3.1 (fCLT). Assume (J^J](i)(ii), and that g £ Jzfy r foT some r G [0,±). Th en: 

s d (t) W CT 2 

4> Q -.t 

where {Wt} is a Brownian motion and 

<4o:t = C 1 - 0o) / T„(^-fc u (3„) 2 )du , (14) 

wii/i ffu(") ^ e unique solution of the Poisson equation: 

g(x) - n u (g) = g u (x) - k u (g u )(x) . (15) 
In particular, Wj(d) => A/"(0, ol) with a 2 = o\ a . x . 

We will now need the following result on the growth of Wj(d). 
Lemma 3.1. Assume (y^Jj(i) (ii) , and that g £ Jzfyr f or some r £ [0, 5). Then, there exists S > such that: 

supE[|W 3 -(d) \ 2+s ] < 00 . 
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Proof. This follows from the decomposition in Theorem IA.1I and the following inequality: 

n\WM\ 2+5 }< (^^M(S){E[\M 0:d - 1 \ 2+s ] + E[\R Os ^ 1 \ !i + s ]) . 

Applying the growth bounds in Theorem IA.ll we get that the remainder term E [ |i?o : d-i| 2+5 ] is controlled as 
7r <j> (V r ) < 00 (due to r € [0, \)). The martingale array term E [ | A/ :d— 1 1 2+<S ] is upper bounded by A/rf( 2+<5) / 2 , 
which allows us to conclude. □ 

One can now obtain the general result. 

Theorem 3.2. Assume (J^^i)(ii), ^0). Suppose also that g £ Jzfyr for some r £ [0, |). Then, for any fixed N > 1, 
ESS(od)(AQ converges in distribution to 

F . ieL^f 



where ' Af(0, cr 2 ) for cr 2 specified in Theorem \3.1\ In particular, 



lim E[ESS (o ,d)(A0] =E 



d— ¥00 



N a X^21 



Etre 



(16) 



Proof. We will prove that a(d), as defined in (HJ), converges in distribution to 7V(0, ex 2 ). The argument is standard: 
it suffices to check that the random variables Wj(d), j = l,...,d, satisfy the Lindeberg condition and that their 
second moments converge (see e.g. an adaptation of Theorem 2 of [551 pp.334]). To this end, note that {Wj(d)}dj 
form a triangular array of independent variables of zero expectation across each row. Let 

3=1 

the last equation following from Wj(d) being i.i.d. over j. Now, Theorem 13.11 gives that Wi(d) converges in 
distribution to A/"(0,<7 2 ) for d — > 00. Lemma \3.l\ implies that (e.g. Theorem 3.5 of [15]) also the first and second 
moments of Wi(d) converge to and cr 2 respectively; we thus obtain: 

lim Sj = o-l . (17) 

a— >ao 

We consider also the Lindeberg condition, and for each e > we have: 

lim i^E[^. (d ) 2 I | _ (£i)| ^] = (18) 

a result directly implied again from Lemma 13.11 Therefore, by Theorem 2 of |55[ pp.334], a(d) converges in 
distribution to A/"(0, cr 2 ). In particular we have proved that: 

(ai(d),...,a N (d)) => Af N (0, a 2 J N ) , 

where the subscripts denote the indices of the particles. The result now follows directly after noticing that 



ESS 



(o,d) 



and the mapping (ai, a^i ■ ■ ■ , ckjv) ^ T^'w 1 2 Q is bounded and continuous. □ 
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3.3 Monte Carlo Error 

We have shown that the choice of bridging steps as in ([6]) stabilises ESS in high dimensions. The error in the 
estimation of expectations, which can be of even more practical interest than ESS, is now considered. In particular 
we look at expectations associated with finite-dimensional marginals of the target distribution. Recall the definition 
of the weight of the i-th particle ^dOco'd-i) from for 1 < i < N. In order to consider the Monte Carlo error, 
we use the result below, which is of some interest in its own right. 

Proposition 3.1. Assume AM- an ^ ^ V -2v> f or r & [0, 1]. Then we have: 

lim \E[tp(X dA )]-w(cp)\=0 . 

a— >oo 

Proof. This follows from Proposition lA.ll in the Appendix when choosing time sequences s(d) = 4>o and t(d) = 1. □ 

Remark 3.1. The above result is interesting as it suggests one can run an alternative algorithm that just samples a 
collection of independent particles through a grid of values of the annealing parameter and average the values of the 
function of interest. However, it is not clear how such an algorithm can be validated in practice ( that is how many 
steps one should take for a finite time algorithm) and is of interest in the scenario where one fixes d and allows the 
time-steps to grow; see J<5 7| /. In our context, we are concerned with the performance of the estimator that one would 
use for fixed d ( and hence a finite number of steps in practice) from the SMC sampler in high- dimensions; it is not 
at all clear a-priori that this will stabilize with a computational cost 0(Nd 2 ) and if it does, how the error behaves. 

The Monte Carlo error result now follows; recall || • || e is defined in Section [T31 

Theorem 3.3. Assume (AT(i)(ii), AM w "ith 9 € ^v t for some r £ [0, 4). Then for any 1 < g < oo there exists a 
constant M = M(g) < oo such that for any N > 1, ip € Cf,(R) 



lim 

d— yoo 



N 



< 



M( e )\\<p\ 



N 



™ r e -f- e(e-i) 



i=l E^ll^d(^0:d-l) 

Proof. Recall that the N particles remain independent. From the definition of the weights in Q, we can write 
Wd{Xo:d-i) = ^-3=1 w i( d } f or Wj(d) being i.i.d. and given in (|12|) . Now, we have shown in the proof of Theo- 
remOLjthat 4= £^ =1 Wj(d) =*> Af(Q, a*), thus: 

X~JV(0,o-l). (19) 



w d (X ;d-i) => e 



Then, from Proposition 13.11 X d ,i converges weakly to a random variable Z ~ tt. A simple argument shows that 
the variables Z, X are independent as Z depends only on the first co-ordinate which will not affect (via Wi(d)) 
the limit of Ylj=i Wj(d). The above results allow us to conclude (due to the boundedness and continuity of the 
involved functions) that: 



lim 

d— ¥ oo 



N 

E 



MX'0:d-l) 

E?=iMK.d-i) 



N 

E 



N 
1=1 



■ip(Zi) - tt(vj) 



(20) 



where the Xi are i.i.d. jV(0, cr 2 ) and independently Zi are i.i.d. ir. Now, the limiting random variable in the L e -norm 
on the right-hand-side of (|20| can be written as: 



.4 



e"V 2 A 



N 



A N ]+e~^ 2 [A 



(21) 



for An jV = jt Ej=i eXi< p{Zi) and A^ = j? E/=i eXl ■ Now, using the Marcinkiewicz-Zygmund inequality (there is 
a version with g £ [1,2) see e.g. [26, Chapter 7]), the L e -norm of the first summand in (|21[) is upper-bounded by: 



M(q) 



:/2 



N 



where M(g) is a constant that depends upon g only. Then applying the C p — inequality and doing standard calcu- 
lation, this is upper-bounded by 



M(g)\\<p\ 



N 



[ 



,-#-e(e-i) 



1] 



1/0 
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for some finite constant M(g) that only depends upon g. For the L e -norm of the second summand in (|21[) , again 
after applying the Marcinkiewicz-Zygmund inequality we have the upper-bound: 



Using the C p — inequality and standard calculations we have the upper bound: 



for some finite constant M(g) that only depends upon g. Thus, we can easily conclude from here. □ 

4 Incorporating Resampling 

We have already shown that, even without resampling, the expected ESS converges as d — > oo to a non-trivial limit. 
In practice, this limiting value could sometimes be prohibitively close to 1 depending on the value of a\\ related 
to this notice that the constant at the upper bound for the Monte Carlo error in Theorem 13.31 is an exponential 
function of a\ and could be large if crj is big. As a result, it makes sense to consider the option of resampling in 
our analysis in high dimensions. We will see that this will result in smaller bounds for Monte Carlo estimates. 

The algorithm carries out d steps as in the case of the algorithm without resampling considered in Section but 
now resampling occurs at the instances when ESS goes below a specified threshold. For fixed d, the algorithm runs 
in discrete time. Recalling the analogue between discrete and continuous time we have introduced in Remark 12.21 
a statement like 'resampling occurred at t € [4>o, 1]' will literally mean that resampling took place after ld(t) steps 
of the algorithm, for the mapping ld{t) between continuous and discrete instances defined in (J7J; in particular, the 
resampling times, when considered on the continuous domain, will lie on the grid Gd- 

Gd = {0o + n (1 - <t>o)/d; n = l,...,d} 

for any fixed d. 

Assume that s G [0o, 1] is a resampling time and x{\ s \ , ■ ■ ■ x 'i d i s \ are the (now equally weighted) resampled 
particles. Due to the i.i.d. assumptions in Q and (O, after resampling each of these particles will evolve according 
to the Markov kernels fej d ( a )+i, &z d ( s )+2; independently over the d co-ordinates and different particles. The 
empirical ESS will also evolve as: 

ESS(a %m . (g|^Ag^~»! (22) 

for u G [s, 1], where we have defined: 

1 — 

V n=l d (s) + l 

until the next resampling instance t > s, whence the N particles, &} d n<\ = i x \ d (t) v" ,x ld(t) d) wm " ^ e resampled 
according to their weights: 

d 

w i d (t)( x u( s y.(i a (t)-i)) = ex P{73X! s kj} ■ 

3=1 

Note that we have modified the subscripts of ESS in (j22j) . compared to the original definition in ([2]), to now run 
in continuous time. It should be noted that the dynamics differ from the previous section due to the resampling 
steps. For instance S l s . u j are no longer independent over i or j, unless one conditions on the resampled particles 
x';\ ,,l<i<N. 
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4.1 Theoretical Resampling Times 



We start by showing that the dynamically resampling SMC algorithm, using a deterministic version of ESS (namely, 
the expected ESS with one particle) will resample a finite number of times (again as d — > oo) and also exhibit conver- 
gence of ESS and of the Monte Carlo error. Subsequently, we show that a dynamically resampling SMC algorithm, 
using the empirical ESS (with some modification) will, with high probability, display the same convergence proper- 
ties. 

We use the resampling-times construction of |29] : this involves considering the expected value of the importance 
weight, and its square, over a system with a single particle. The theoretical resampling times are defined as: 

f E[exp{^=£ti^o:t }] 2 > 

tiW) = mf t s [<t>0, 1] : r ff^i \\ < 4 ; (23) 

tk(d) = xaf it € [tk-i(d),i\ : V? c n <a ' ^ 

for a constant a € (0, 1), under the convention that inf = oo. Note that, for most applications in practice, these 
times cannot be found analytically. We emphasize here that the dynamics of S s . t appearing above do not involve 
resampling but simply follow the evolution of a single particle with d i.i.d. co-ordinates, each of which starts at 
%o,j ~ and then evolves according to the kernels k n . Intuitively, following the ideas in |29| , one could think of 
the deterministic times in ([2l?l) - (l2~r| as the limit of the resampling times of the practical SMC algorithm in Figure 
Q] as the number of particles N increases to infinity. 

We will for the moment consider the behaviour of the above times in high dimensions. Consider the following 
instances: 

h = inf{f G [0o, 1] : e^o-t < a} ; (25) 
t k =M{t£ [i fe _i,l] :e~<-^ <a} , fc>2, (26) 



where for any s < t in [0 , 1] : 



1 o:t - °l,s = 0-- <M I Kuidl - k u (g u ) 2 )du . (27) 



Under our standard assumptions (A[T][5]), and the requirement that g £ Jzfyr for some r€ [0, h), we have that (using 
Lemma I A. II in the Appendix): 

*u(Z - k s (g u ) 2 ) < Mir u (V 2r ) < M'tt^V) < oo . 

Thus, we can find a finite collection of times that dominate the t^'s (in the sense that there will be more than 
them), so also the number of the latter is finite and we can define: 

m* = #{t fc :fc>l,f fc e[0o,l]}<oo. (28) 

We have the following result. 
Proposition 4.1. As d — > oo we have that tk{d) — > t k for any k > 1. 

Remark 4.1. Note that the time instances are derived only through the asymptotic variance function 1 1— > cr? t ; 
our main objective in the current resampling part of this paper will be to illustrate that investigation of these 
deterministic times provides essential information about the resampling times of the practical SMC algorithm in 
Figure [7J These latter stochastic times will coincide with the former (or, rather, a slightly modified version of it) 
as d oo with a probability that converges to 1 with a rate 0(N' 1 /' 2 ). 



4.2 Stability under Theoretical Resampling Times 

Consider an SMC algorithm similar to the one in Figure [T] but with the difference that resampling occurs at the 
times {tk{d)} in (f2"5|) - ([2~l| : it is assumed that to(d) = <po- Note that due to Proposition 14.11 the number of these 
resampling times: 

m* d = #{t k (d):n>l,tk(d) € [<hA]} 
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will eventually, for big enough d, coincide with m* in (f2"5)) . We will henceforth assume that d is big enough so that 

m* d = m* < oo. 

We state our result in Theorem 14. II below, under the convention that t m *+\(d) = 1. The proof can be found in 
Appendix IC.2I It relies on a novel construction of a filtration, which starts with all the information of all particles 
and co-ordinates up-to and including the last resampling time. Subsequent er— algebras are generated, for a given 
particle, by adding each dimension for a given trajectory. This allows one to a use a Martingale CLT approach 
by taking advantage of the independence of particles and co-ordinates once we condition on their positions at the 
resampling times. 

Theorem 4.1. Assume (^OHIP and g G Jzfy with r G [0, h). Then, for any fixed N > 1, any k G {1, . . . , m* + 1}, 
times tk-i < tf~, and Sk(d) G (tk~i(d),tk(d)) any sequence converging to a point Sk G (tfc-i,tfc), we have that 
ESS(t s ._ 1 (d) iSi .(d))(iV) converges in distribution to a random variable 



where X, 



[E 



N 



E 



N 



;2X- 



tk— l'-8k ■ 



and cr? 



tk—l'-Sk 



as in 



([77)) . In particular, 



UmE[ESS to _ l(d) , 

a—>oc 



*(<*)) 



(NY 



E 



[E 



N 



V^W e 2X k 



Note that, had the tk(d)'s been analytically available, resampling at these instances would deliver an algorithm 
of d bridging steps for which the expected ESS would be regularly regenerated. In addition, this latter quantity 
depends, asymptotically, on the 'incremental' variances 00 o . tl , <7 f 1 -t 2 > • • ■> °t * i> m contrast, in the context of 
Theorem 13.21 the limiting expectation depends on crjL^ = er^. We can also consider the Monte- Carlo error when 
estimating expectations w.r.t. a single marginal co-ordinate of our target. Again, the proof is in Appendix I C. 2 1 

Theorem 4.2. Assume (^T^j with g G Jzfyr for some r G [0, i). Then for any 1 < g < oo there exists a constant 
M = M(g) < oo smc/i £/ia£ for any fixed N > 1, ip 6 Cf,(R) 



lim 

d— >oo 



AT 

E 



Z d (i m ,(d)):(d-lV 



II Ei=l W <i(^ d(4m , (d)):((:i _ 

Remark 4.2. In comparison to the bound in Theorem \3.tA the bound is smaller with resampling: as <p$ < t m * the 
bound in Theorem \4-%\ is clearly less than in Theorem \3.3[ Whilst these are both upper-bounds on the error they are 
based on the same calculations - that is a CLT and using the Marcinkiewicz-Zygmund inequality. 

Remark 4.3. On inspection, the bound in the above result can be seen as counter-intuitive. Essentially, the bound 
gets smaller as t m * increases, i.e. the closer to the end one resamples. However, this can be explained as follows. 
As shown in Proposition \3.1[ the terminal point, thanks to the ergodicity of the system, is asymptotically drawn 
from the correct distribution n. Thus, in the limit d — > oo the particles do not require weighting. Clearly, in finite 
dimensions, one needs to assign weights to compensate for the finite run time of the algorithm. 

We remark that our analysis, in the context of resampling, relies on the fact that N is fixed and d oo. If N 
is allowed to grow as well our analysis must be modified when one resamples. Following closely the proofs in the 
Appendix, it should be possible by considering bounds (which do not increase with N and d) on quantities of the 
form 

- v ^h(t k (d))i 



< 



M{ e )\W\\ 



-g(g-l) 



N 



1] 



Ve 



E 



N 



i=i Ej=i w id(tk(.d)){ x i d (t k _ 1 (d)y.(.i d (.t k w)-i) 
to establish results also for large N; we are currently investigating this. However, at least following our arguments, 
the asymptotics under resampling will only be apparent for N much smaller than d; we believe that is only due to 
mathematical complexity and does not need to be the case. 



4.3 Practical Resampling Times 

We now consider the scenario when one resamples at the empirical versions of the times (|2"5)) - (f2"4")) . To this end, we 
will follow closely the proof of and this will require the consideration of a finite mesh at the definition of the 
resampling times. Consider some positive integer S, and the grid: 

Gs = {0o, 0o + (1 - M/S, 0o + 2(1 - O ) A ■ • • , 1} • 
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We consider the SMC algorithm that attempts to resample only when crossing the instances of the grid G$, using 
the practically relevant empirical ESS. That is, we are interested in the times {T k = T k (d)} defined as: 

T x = inf{t G G 5 n [<h, 1] : jf ESS^ :t (iV) < o x } ; 

T fc = inf{t e G 4 H [Tfc_i,l] : % ESS^_ l!t (JV) < a fe } , fc > 2 , 

for a collection of thresholds (afe) in (0, 1). 

Following the development in [29], we will need the following theoretical times: 

i?(d) = inf t G G, R fa,, 1] : -i * < a, 

4(d) = ^ (* e C H ^(d), 1] : l\^ [ ^7 lS y^ < -*} . * > 2 ■ 
I E[exp{^E i= i5 tLi(d):tij }J J 

We can, for a moment, obtain an understanding of the behavior of these times as d — > oo. Define the time instances: 

if = inf{t G G 5 n [0o, 1] : e - **"* < ax] ; 

4 = inf{* eGjfl 1] : e~ CTi £-i :t < a k } , fc > 2 . 

If m*(<5) denotes the number of these times, we have that m*(S) < m* (with m* now taking into account the choices 
of different thresholds afe), but for 6 large enough these values will be very close. 

Proposition 4.2. As d —> oo we have that t k (d) —> t s k for any k > 1. 



Proof. The proof of ti(d) — > t\ in Proposition 14. II is based on showing uniform convergence of 



E[exp{^E-=i^ : tJ }]' 

1 1 — ^ 



E[CXP{^E-=1^0:M}] 

2 

to t h> e _ °^° :t . Repeating this argument also for subsequent time instances gave that t k (d) — > i& for all relevant 
fc > 1. This uniform convergence result can now be called upon to provide the proof of the current proposition. □ 

Also, Theorems 14.11 and 14 . 2 1 hold under these modified times on Gs- 
Main Result and Interpretation 

We will use the construction in ^Z§\. The results therein determine the behavior of the SMC method for d fixed and 
increasing number of particles N, as described in the sequel. Define, for a given v <G (0, 1), the following event: 

= n%(v,{a k }x< k < m *( S) ) := {for all 1 < fc < m*(5), s G G d n [4_ x (d),4(d) ] : 

I F ESS (tl-i(d),s)( N ) ~ ESS (t£_i(<*),s) I < l ' I ESS (t£_i( d ). s ) ~~ flfc l } 

where 

E[exp{^£ i=1 ^_ i(d):Sii )J 

corresponds to the expected ESS over a single particle involved in the definition of {t k (d)}. Here (afc)i<fc<m* are 
a collection of thresholds which are sampled from some absolutely continuous distribution; they are determined in 
such a way to avoid the degenerate situation when the thresholds a k coincide with ESS; see [29] for details. Now, 
the definition of fl^ implies the following: 

1. Within 0^ , if the deterministic resampling criteria tell us to resample, so do the empirical ones. That is: 



ESS (tLi(d) , s) > a k ^ESS (t j_ iCd)iS) (JV)>o fc) a G G s n [4_x(d),t s k (d)} 

and 

ESS (tLi(d) , s) < a k => iESS (ti _ i(d)iS) (iV)<afe , 3 £ G s n[t s k _x(d),t s k (d)] 
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2. A consequence of the above is that (this is Proposition 5.3 of |29|): 

D {T k = 4(d)} d n» . 

l<k<m'(S) 

3. Conditionally on {afc}i<fc< m *(,5), we have that P [f2 \ ] — > as N grows [29) Theorem 5.4] (d is fixed). 

The above results provide the interpretation that, with a probability that increases to 1 with N, the theoretical 
resampling times {t s k (d)} will coincide with the practical {Tfc = T k ' N (d)}, for any fixed dimension d. 

Our own contribution involves looking at the stability of these results as the dimension grows, d — > oo. 

Theorem 4.3. Assume (AU§^) and that g € .Sfyr , with r £ [0, \). Conditionally on almost every realization of the 
random threshold parameters {a^} , there exists an M = M(m*(5)) < oo such that for any 1 < N < oo ; we have 

ixmF[n\n»]<^L 

d— YOO yl TV 



The proof in Appendix IC.3I focuses on point 3. above of the results in [29]. Thus, investigation of the times {t s k } 
involving only the asymptotic variance function o~ 2 s . t can provide an understanding for the number and location of 
resampling times of the practical algorithm that uses the empirical ESS. This is because, with high probability, that 
depends on the number of particles (uniformly in d), the practical resampling times will coincide with {tk{d)}. 

5 Example on Symmetric Random Walk 

We will now verify assumptions (A|T]|2J) when the ^-invariant transition kernel is a Random- Walk Metropolis (RWM) 
algorithm, with proposed increments jV(0, That is: 

q s (x,dy) = -^Le-*^ 1 dy 

with acceptance probability: 

a s (x,y) = 1 A — — . 
For simplicity we set q s (dy) = q s (Q,dy). That is, we will look at the Markov kernel: 

k s (x,dy) = a s (x,y)q s (x,dy) + S x (dy) / (1 - a s (x,y))q s (x, dy) . (29) 

Je 

Notice that we assume that the variance of the proposal is 1/s, s £ [<po, 1]- One can use /(s) _1 for the proposal 
variance, where / is a bounded positive continuous function that is monotonically increasing with a bounded 
derivative. This is omitted only for notational clarity and using / in the proofs will only complicate the subsequent 
notations. 

We will assume that for every s£ [<f>Q, 1] one has 

• tt s is bounded away from zero on compact sets and is upper-bounded. 

• 7r s is super-exponential with asymptotically regular contours; see |37] for details. 
We will add the condition 



C* := sup <^ / G(x,z)q s {z)dz \ < +oo (30) 

with G(x,z) = g(x) — g(x + z) > on A(x) c (see (|59|) for details on A(x)). This assumption is used to simplify 
some calculations in the proof and is verifiable (see Remark 15. ip . The above assumptions will be termed E in the 
following proposition. The proof can be found in Appendix iDl 

Proposition 5.1. Assume (E). Then the symmetric random walk kernel (|29p satisfies (Al-2). 

Remark 5.1. It is straightforward to verify (Al) using standard results in the literature. However, (A2) is non- 
standard, due to the difference of invariant measures present in (|29[) . Note, for (|30l) . that if g(x) = —x 2 /2 then 
G(x, z) = \ \z 2 + 2xz] . Hence we have 



'A(x 

Thus, assumption i30\) will hold in the Gaussian case 



f 11 

/ G(x,z)q s (z)dz < — < — . 
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6 Discussion and Extensions 



We now discuss the general context of our results, provide some extra results and look at potential generalizations. 



6.1 On the Number of Bridging Steps 

Our analysis has relied on using 0(d) bridging steps. An important question is what happens when one has more 
or less time steps. We restrict our discussion to the case where one does not resample, but one can easily extend the 
results to the resampling scenario. Suppose one takes L^ 1+l5 J steps, for some real 5 > — 1 and annealing sequence: 



n(l — 0o ) 



n £ {0, . . . , [d + J} . 

We are to consider the weak convergence of the centered log-weights, which are now equal to: 

Vd 



Ldi+«ji/2 



oti{d) 



where we have defined 



with i G {1, . . .,N} and 



a ^ = ^ E W i ^ 5 W i ^ = W i ^ - E [ W i ^ 1 > 



Wj(d) 



1 - 



ld 1+s \ 



{ 9(X„-1 ,j) - 7T n _l( 5 ) } 



(31) 



\ d i+sn/2 

L J n—l 

One can follow the arguments of Theorem 13.21 to deduce that, under our conditions: 
This observation can the be used to provide the following result. 

Corollary 6.1. Assume (A[J^i)(ii), A\^) and that g £ ££yr f or some r £ [0, h). Then, for any fixed N > 1: 

• If 5 > then ESS (o , Ld i+ 5J) (A0 ^ P N. 

• If-1< S < then ESS (0iLd i+*j)(iV) -> P 1. 

Proof. Following (J2J), if <5 > then we have that ^1+^1/2 oti(d) — >p 0. All particles are independent, so the proof 
of the ESS convergence follows easily. 

For the case when — 1 < 6 < we work as follows. We consider the maximum M(d) = max{ai(d); 1 < i < d}. 
Let a(i)(d) < a( 2 )(d) <■■■< a^(d) denote the ordering of the variables a.\{d) — M{d), 012(d) — M(d), aiy(d) — 

M(d). We have that (setting for notational convenience fd := ^ d i^ 1/2 )'■ 



ESS (0!Ld i +5J) (AT) 



?2a,(d)f d 



1 + 2a (l) (d)/ d 



(32) 



Due to the continuity of the involved mappings, the fact that (ai(d), . . . , ajv(d)) =>■ A/"(0, erj /at) implies the weak 
limit (a(!)(<i), . . . , afN-%\(d)) => (an), ■ ■ • , S(jv-i)) as d -> 00 with the latter variables denoting the ordering 
<5(i) < ci(2) < • •• < cS(at) - of cti - M,ct2 — M, . . . , ajv — Af where the ctj's are i.i.d. from 7V(0, cr^) and M 
is their maximum. Since (a(i)(d), . . . , a(N-i)(d)) and their weak limit take a.s. negative values, we have that 
(a(i)(d)fd, ■ ■ ■ , (X(N-i)(d)fd) (— co, • ■ ■ , —00) which (continuing from (|32[) ) implies the stated result. □ 



For the stable scenario, with 5 > 0, we also have the following. 

Corollary 6.2. Assume (y^J](i) (ii) , with g £ J£yr for some r £ [0,4). Then for any 1 < g < 00, N > 1, 

if £ C b (R), 6 > 0: 



lim 

d— ¥ 00 



W d( X 0:Lrfi+*J-l) 
i=l 2i=l u; d(-X"o:|d 1 +' ! J-l) 



iV 

E 



1 N 



where A; ~ 7r. 
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Proof. This follows from the proof of Theorem 13.31 and Corollary 16.11 



□ 



Thus, a number of steps of 0(d) is a critical regime: less than this, will lead to the algorithm collapsing w.r.t. the 
ESS and more steps is 'too-much' effort as one obtains very favourable results. 

6.2 Full-Dimensional Kernels 

An important open problem is the investigation of the stability properties of SMC as d — > oo when one uses full- 
dimensional kernels K n (x, dx'), instead of a product of univariate kernels considered in our analysis. We will state 
a conjecture for this case here, indicating the increased technical complexity to the scenario of this article and 
sketching future research in this direction. We remain in the i.i.d. context for the target density and do not consider 
resampling for ease of presentation. Consider the Markov kernel P n (x, dx') with invariant density H n corresponding 
to RWM with proposal dynamics (Z ~ A/d(0, l>0): 

X pr = x + \fh Z ; h = K 

so that X' = x pr with probability a(x,x pr ) = 1 A {II n (a;p r )/n n (a;)}; otherwise X' = x. The particular choice of 
step-size h shown in the proposal above as an order of d was found in the MCMC literature ( |51l 1521 P7]) to provide 
algorithms that do not degenerate as d increases. 

We consider the standard SMC method in Figure [T] under the choice of kernels K n = (P n ) d for RWM so that 
at each instance n we synthesize d steps from P n (x, dx'). We conjecture that this choice for K n (x, dx') will provide 
a stable SMC method as d — s- 00. Some of the fundamental building blocks of our analysis for the asymptotic 
properties of the ESS when using product kernels in the previous sections are: (i) the independency over the d 
co-ordinates; (ii) each co-ordinate is making C(l)-steps in it's state space with dynamics of appropriate ergodicity 
properties. As analytically explained in the aforementioned MCMC literature, the convolution of d steps for RWM 
provides, asymptotically, independency between the co-ordinates, with each co-ordinate making (essentially) d steps 
of size 1/d along the path (over the time period [0, 1]) of the following limiting scalar SDE: 

dY n (t) = ^pi(\ gn n )'(Y n (t))dt+ y/a n (l)ldW t (33) 

with a n (l) = lim c i_ 5 . 00 E [a(X, X pr ) ] € (0, 1); the expectation is with x in stationarity, X ~ II„. Thus, we conjecture 
that, when considering the centered log- weights: 

J_ T,t=i{9(xn-i,j)-nn(9)} ^ 
Vd f-f \fd 

their weak limit would remain unchanged if the dynamics of the Markov chain with kernels K n = (P n ) d are replaced 
with those of a Markov chain with K*(x 7 dx') = IIj=i Ki( x jidx'j) where k^(xj,dx'j) = F[Y n (l) € dx^ \ Y n (0) = xj } 
is the transition density of the SDE (f3"3"|). Now, under these dynamics, we are within the context of our main results 
in Section [3] and, under the assumptions stated there, we can prove weak convergence of ([34]) to A/"(0,erJ) for a 2 
now involving the continuum k* (xj , dx'^ ) of the SDE transition densities. 
Thus, the technical challenge left for future research is proving that: 

1 d 

2 Yl {a{x n ^i,}) - g(y n -i.j(l))} , 

n,j=l 

that requires coupling the probability measures Ilo K\ ■ ■ ■ K n and ITo K\ ■ ■ ■ K* determining the dymamics of the 
time-inhomogeneous d-dimensional Markov chains {x$, x\, . . . , Xd} and {ya(l), J/i(l), • • • , J/d ( 1 ) } respectively. That 
is to say, a coupling between the d-steps of RWM, K n = (P n ) d , and the sample paths of the limiting diffusions, 
determining K*. This is certainly a non-trivial task that will go beyond the aforementioned MCMC literature, as 
the limiting results are based on convergence of generators and do not require strong path-wise convergence. 

Under our conjecture, the SMC method based on full-dimensional RWM kernels, with stabilize at a total cost 
of 0(Nd 3 ). A similar conjecture for MALA (Metropolis-adjusted Langevin algorithm) will involve stability of the 
SMC method at a reduced cost of 0(Nd 7/3 ) as for MALA one has to synthesize 0(d 1/3 ) steps of size 0(d~ 1/3 ) 
to obtain the diffusion limit (see [S5])- Finally, we conjecture that an alternative SMC method that uses 0(d 2 ) 
bridging steps (</>„ = 4>o + "-(1 — 4>o)/d 2 ) with RWM transition kernels of step-sizes h = l/d 2 as before, instead of 
convoluting d full-dimensional kernels (for MALA, that would involve using 0(d 4 ^ 3 ) bridging steps) will also be 
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stable for fixed N as d increases. This is because of the the structural similarity of it's dynamics for blocks of d 
bridging steps with the previous case; however a proof for this case does not seem to be connected with the work in 
our paper and will have to follow a different direction. An analytical solution to this latter issue, by consideration 
of the variances in the CLT, may help to answer whether or not one should iterate the MCMC kernel or have more 
annealing steps in high-dimensional scenarios. 

6.3 Beyond I.I.D. Targets 

In the MCMC literature, the first attempts to move beyond the i.i.d. context involved looking at restricted classes 
of models, see e.g. [T71 [TH \7\. The most recent contributions in this still-open research direction have looked 
at target distributions in high-dimensions defined as changes of measure from Gaussian laws ( |121 l45l I49| ). This 
probabilistic structure contains a large family of practically relevant statistical models (see e.g. [13]). We will discuss 
an extension of our results in this paper in such a direction. Following |45[ I49|. we consider a target distribution on 
an infinite-dimensional separable Hilbert space % determined via the change of measure: 

dn 

— -(x) cx exp{-^(x)} , x G H , 
a\ \q 

for some functional ^ : H h-> R, with n = Af(0,C) a Gaussian law on H. Let {ej}j S N be the orthonormal base 
of H comprised of eigenvectors of C with corresponding eigenvalues {A^}„ s n. Ilo can be expressed in terms of it's 
so-called Karhunen-Loeve expansion: 

oo 

n = X! A ' ; '-' ' ' 

where £j '~ ' A/"(0, 1). In practice, one will have to project the target to some d-dimensional approximation, and a 
standard generic approach is to truncate the basis expansion; that is, to work with the c?-dimensional target: 

IL(x) cx exp{-^ d {x) - \{x,C^ l x)} , x G R d , 

with C d = diag{Aj, ...,X 2 d } and V d (x) = *(E?=i x j e i)- 

In connection with the SMC method in this paper, we will look at the algorithm in FigureQ]with bridging densi- 
ties H n (x) oc {II(a;)}^™, where tp n = <f>o + n(l — (j>o)/d, and propagating kernels K n = (P n ) d , with P n corresponding 
to Markov transition of a RWM algorithm with target distribution II n and proposal: 

X pr = X + VhCy 2 Z; h=£, 

with Z A/d(0,/(i). Again, we do not consider the possibility of resampling, only for notational simplicity. Our 
conjecture here is that this SMC method will be stable as d — > oo, for fixed number of particles TV, at a total 
computational cost 0(Nd 3 ). In a similar context to Section W% it is shown in [15] that the above choice of step-size 
h provides a non-degenerate MCMC algorithm as d — > oo. More analytically asymptotically in d, the d steps of 
Markov transitions P n correspond to making steps of size 1/d on the paths of an H- valued SDE. The centered 
log-weights will now be: 

, , d 

Yl ( - M^n-l) + E [ ^d(Xn-l) ] - \{Xn-U Cj 1 X n -l) + \ E [ Cj 1 X n -l) } ) 

n— 1 

with X„ | X n -i = x n -i ~ K n (x n -i, •). We conjecture here, that starting from a d-variate version of the Poisson 
equation (a generalisation of the univariate version for the results proven in this paper) one should aim at showing: 

1 d 

- 22{^ d (x n -!) - E^diXn^)]} ^0 ; 

n— 1 

d 

n— 1 

for some asymptotic variance a 2 . For the first limit, one should consider a Poisson equation associated to the 
functional x y~ > ^(rr), for the Markov chain with dynamics K n . For the second limit, the d-variate Poisson 
equation should apply upon the functional x i-> (x, C^ 1 x)j\fd. Both these functionals seem to stabilize as d — > oo. 
The asymptotic variance a% is expected to involve an integral over the transition density of the limiting "H-valued 
SDEs. 
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6.4 Some New Results 



An important application of SMC samplers is in the approximation of the normalizing constant of II. This is a 
non-trivial extension of the work in this article, but we have obtained the stability in high-dimensions of the relative 
L2— error of the SMC estimate; we refer the reader to [10]. This stability is achieved with a computational cost of 
0(Nd 2 ) with stronger assumptions than in this article. 

Recall that we have used the annealing sequence ([5]). However, one could also consider a general differentiable, 
increasing Lipschitz function </>(s), s € [0, 1] with 0(0) = <f>o >0, </>(l) = 1, and use the construction 4>„_d = 4>{ n /d)\ 
this is also considered in \10\. The asymptotic results generalized to the choice of <$> n ,d bere would involve the 
variances: 

a 1-t = J n <j>(u) - fc0(u)(30(«)) 2 ) 

So for example the bound in Theorem 13.31 becomes 

M(Q)\\<P\\oo r$kp(p-l) , 

In theory one could use this quantity to choose between SMC algorithms with different annealing schemes; see |10) 
for some discussion. 

An interesting avenue to pursue is the stability of the SMC approximation of multi-level Feynman-Kac formulae 
|26| . This is particularly important for problems in rare-events analysis. In this case one introduces a sequence of 
sets which converge to the rare region of interest. The question is how to parameterize the sets such that, as one 
makes the set of interest rarer, the algorithm is stable (e.g. w.r.t. logarithmic efficiency). We suggest [TH] and [35] 
from the splitting literature as useful starting points. It may also be of interest to investigate more advanced SMC 
samplers such as [551 123] ■ 
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d<j)(u) 
du 



d<j)(u) 



< s < t < 1 



A Technical Results 

In this appendix we provide some technical results that will be used in the proofs that follow. The results in Lemma 
IA.1I are fairly standard within the context of the analysis of non- homogeneous Markov chains with drift conditions 
(e.g. [32]). The decomposition in Theorem I A. 1 1 will be used repeatedly in the proofs. 

For a starting index no = no(d) we denote here by {X n (d) ; no < n < d} the non-homogeneous scalar Markov 
chain evolving via: 

P[X n (d) G dy I X n -i(d) = x] = k n , d (x,dy) , n < n < d , 

with the kernels k n> d preserving Tt n> d- All variables X n (d) take values in the homogeneous measurable space {E, <o) = 
(R, B(M)). For simplicity, we will often omit indexing the above quantities with d. 

Given the Markov kernel k s with invariant distribution ir s (here, s G [</>o, 1]), and some function ip, we consider 
the Poisson equation 

<p(x) - it s (lp) = f(x) - k s (f)(x) ; 

under there is a unique solution /(•) (see e.g. [47]). which can be expressed via the infinite series f(x) = 

J2i>oi^s ~~ 7r s]( < y 9 )( x )- We use the notation / = V(ip, k s ,ir s ) to define the solution of such an equation. 
We will sometimes use the notation Ejc„ [ • ] = E [ • | X na ] ■ 

Lemma A.l. Assume (j^TH^I. Then, the following results hold. 

i) Let ip € J?v r f or some r € [0, 1] and set (p = V((p, k s , tt s ). Then, there exists M = M{r) such that 

\tp{x)\<M\<p\ V rV{xy . 
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ii) Let tp s ,(pt G JSfy for some r G [0,1] and consider ip s = V(ip s , k s , 7r s ) and (p t = V(tpt, kt, 7r t ). Then, there 
exists M = M(r) such that: 

\(p t {x) - <P 3 (X)\ < M (\lfi t - (fi s \ V r + \(p t \ V T \\\kg ~ k t \\\ V r ) V {xf . 

Hi) For any r G (0, 1] and < no < n: 

E [v(x n y | x„ ] < \( n - n °>v r {x no ) + 1 - A 1 ? A ;" 0) 6 r < Mr (x no ) . 

Proof, i) : We proceed using the geometric ergodicity of k s : 

i^)i = i E^-^K^wi < i^En^-^wik- < m bi^tx;^] W 



;>o 



z>o 



for some p G (0, 1) and M > not depending on s via (AQJ; it is now straightforward to conclude, 
ii) Via the Poisson equation we have (pt{x) — <fi s (x) = A(x) + B(x) where 



A(x) =Y$~ *t](<Pt)(x) - £>J - *s]{<Pt){x) 
l>0 c 



l>0 

(x) . 



(35) 



2>0 



We start with B(x). For each summand we have: 

| [k l s n s ](<p t - <p,)(x) \ = \<f>t- Vs\vr | [k l 3 - 7T s ]( ^-% r )(x) I 

< \ip t - tp s \ V r \\k l s - n s \\ V r < M \tp t - tp s \ V r p l V(x) r , 

where M > and p G (0, 1) depending only on r due to (A[T|). Hence, summing over I, there exist a M > such 
that for any x E: 

B{x) <M\ip t -v s \ V rV{x) r . 
Returning to A(x) in (f35|) , one can use Lemma C2 of [4] to show that this is equal to: 

r l ~ 1 

£ ]T[fcj - 7Tt][fet ~ fcs]^-' 1 - 1 - 7T s ](<p t )(x) - [TTt - 7T,]([*i - 7T S ](^)) 



Using identical manipulations to [5], it follows that: 



2-1 



<M| Vt | V r|||fc a -fc t ||| V r F(x) r 



2>0 i=0 

and, for some constant M = M(r) > 0: 

| X)^* -"•']([*? -'■"'K^)) I < M \<p t \ V r \\\k s - k t \\\ V r V(x) r . 

n>0 

iii) We will use the drift condition in (AH}. Using Jensen's inequality (since r < 1) we obtain k n (V r )(X n ^i) < 
A r V r (X n -i) + b r for the constants 6, A appearing in the drift condition. Using this inequality and conditional 
expectations: 

E[V r {X n )\X no ]=E[k n (V r (X n _ 1 ))\X no ] < X r E[V r (X n ^)\X no } + b r . 
Applying this iteratively gives the required result. □ 

Theorem A.l (Decomposition). Assume (A^i)(ii),AWj). Consider the collection of functions {(p s }se[4> ,i] with 
ip s G Jifyr for some r G [0, 1) and such that: 

i) sup s \ip s \v- < oo, 

ii) \ip t - (p s \ V r < M \t - s\ . 
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Set tp n ( = fn,d) : = i P{s=4> n (d)} an d consider the solution to the Poisson equation (p n = V{ip n , k n , Kn)- Then, for 
no < ri\ < n,2 we can write: 

n 2 

^ { Pn(X n ) - TTn(lfin)} = M„ i: „ 2 + Rn i: n 2 
n—ni 

for the martingale term: 

™2 

M ni .„ 2 = ^ {Vn(X n ) - kn(<Pn)(X n -l) } 
n— m + 1 

such that for any p > 1 with rp < 1: 

E[\M ni:n2 \P\X no ]<MS vl V r P(X no ) , 
and a residual term R ni:n2 such that for any p > with rp < 1: 

E[|fl ni: „ 2 nX„ ]<Myn>(X„ ) . 

Proof. Using the Poisson equation </?«(•) — ffniPn) = ^n(') — k n (f> n )(-), simple addition and subtraction of the 
appropriate terms gives that: 

^ { Wn{X n ) - TT n (ip n ) } = M ni:rl2 + D ni:n2 - E ni - n2 + T ni - n2 ; (36) 
n-ni 

"2 

Dni:n 2 = I^C^n-l) - ^"-1 (-^n-l)] j 

n— ni + 1 
"2 

E ni :n 2 = ^ [^n(-^n-l) — ^n-l(^n-l)] j 
?i— ni + 1 

-M^i :n2 

Now, using Lemma I A. If i) , (hi) and the uniform bound in assumption (i) we get directly that: 

E [ \T ni:n2 \P \X no }< M V rp (X no ) . (37) 

Also. Lemma I A. If i) together with assumption (i) imply that: 

| {fp n - ^ n _i)(X„_i) | < \<p n - Vn-^vr V r (X n -i) < M i 7 r (X„_ x ) , 

thus, calling again upon Lemma I A. If iii). one obtains that: 

E[\E ni:n2 \P\X no ]<MV rp (X no ) . (38) 

Consider now D ni:n2 . Using first Lemma [A.lf ii'), then conditions (i)-(ii) and (.A[2J one yields: 

|ft,(X„-l)-6,-l(A»-l)| < M i V(X n _ 1 ) r . 

Thus, using also Lemma [A. If iii) we obtain directly that: 

E[\D ni .. n2 \P | X ] < MV(X no ) r P . (39) 

The bounds (f3"T|) . (J3HJ) an d (|3"9")l prove the stated result for the growth of E [ \R ni :n 2 \ p ]■ 

Now consider the martingale term M ni: „ 2 . One can use a modification of the Burkholder-Davis-Gundy inequality 
(e.g. [55J PP- 499-500]) which states that for any p > 1: 

E[\M ni .. n2 \P\X nQ ]<M(p)di yl - 1 ^[\$n{X n )-k n {(pn){Xn-l)\ P \X no ] , (40) 

n— ni + 1 

see [5J for the proof. Using Lemma [A. If i) we obtain that: 

I 'fini.Xn) k n (0 n )(x„_i) | < m |^| y , ( u r pr„) + k n (v r )(x n _i) ) . 

Using this bound, Jensen inequality giving (k n (V r )(X n ^i)) p < k n (V rp )(X n -i), the fact that rp < 1 and Lemma 
lA.lf iii). we continue from (|30")) to obtain the stated bound for M rai: „ 2 . □ 



20 



Proposition A.l. Let tp £ JCy with r £ [0,1]. Consider two sequences of times {s(d)}d, {t(d)}d in [4>q,1] such 
that s(d) < t(d) and s(d) — > s, t(d) — > t with s < t. If we also have that sup d E [ V r (Xi d ( s (d))) ] < oo, then: 

E x ld(Md)) W{Xi d {t(d))) \ -> n t ((f) , m Li . 

Proof. Recall that ir u (x) oc exp{u g(x)} for u £ [<fio, 1]. We define, for c £ (0, |): 

rid = ld(t(d)) - ld{s(d)) ; m d = [{ld(td) - ld(sd)Y\ ; w d = ld(s(d)) + n d — m d ■ 

Note that from the definition of Zd(-) we have rid = C(d)> whereas raj = C(d c ). We have that: 



IE 



A" 



i d ( 3 (d)) 



>(^i d (*(«0))]-Tt(^)| < \®x ld{sm W(X ld{t{d)) ) - k™/(<p)(X Ud )}\ 

+ I Ex^IC/^)^)]- ^(^1 + 1^(^-^(^)1 ■ (41) 



Now, the last term on the R.H.S. of (|41[) goes to zero as ci — > oo: this is via dominated convergence after noticing 
that 

jip(x)e^o+^a-M)g( X ) dx 



with the integrand of the term, for instance, in the numerator converging almost everywhere (w.r.t. Lebesque) to 
(p(x)e t g ( x > (simply notice that Ymvud/d = lim{ld(t(s)) / d} — (i — </>o)/(l — <j>o)) and being bounded in absolute value 
(due to the assumption of g being upper bounded) by the integrable function M V r (x)e^ o9 ^ x \ Also, the second 
term on the R.H.S. of (|4Tj) goes to zero in Li, due the uniform in drift condition in (AQ}; to see this, note that 
(working as in the proof of Lemma fA.l( i)) condition A[T] gives \\k l s — n s \\v^ < M p l V(x) r for any s £ ((fio, 1], so 
we also have that | (<p) (X Ud ) — ir Ud ((p)\ < M p mi V(X Ud ) r . Taking expectations and using Lemma [ATt ni) we 
obtain that: 

|Ex !d(s(d)) [kZ d (<p)(x Ud )}- nuM I < Mp md v(x ld{s{d)) y . 

which vanishes in Li as d — > oo due to the assumption sup d E [ V r (Xi d ( s (d))) } < oo. 

We now focus on the first term on the R.H.S. of (HIT) . The following decomposition holds, as intermediate terms 
in the sum below cancel out, for u d > 1: 

^x ld(sW) W(x ld{m) )-kZ d ^)(x Ud )} = 

m.d — 1 

Ex i d (sw) I i k (u d +iy.(i d (t(d))-j) K d (f)(x Ud ) - fc(« (I +i) : (/ li (t(d))-a+i))^i 1 (^)(^'« (i )}] 

3=0 

where we use the notation ki : j(ip)(x) — J ki(x, dx±) x • ■ • x kj(tp)(xj—i+i), i < j. Each of the summands is equal to 

K d +ui d (t(d))-(j+i)[h d (t(d))-j - k u d ](K d (tp))(X Ud ) 
which is bounded in absolute value by 

M \<p\ V r fc tld + l:/ £i (t(d))-(j + l)(^ 7 )(^u ti ) \\\h d (t(d))- 3 ~ ^ttdlllv • 

Now, from Lemma lA.ir iii): 

k Ud+1:ldm) - ij+1) (V r )(Xu d ) < MV r (X Ud ) . 
Also, from condition (A[5]), there exists an M > such that 

\\\k ld{t{d)) - 3 - k Ud \\\ Vr < M ^ (l d (t(d)) - 3 - ««0 = M fi^sl ( md - j) . 
Thus, using again Lemma lA.ir iii) we are left with 

rriri—l 

m d — 3 



EA- id(s(d)) W{x ld{m) ) - K d d {v){Xu d )] \ < MV r (X ld{s{d}) ) Y, 



d 



3=0 

As sup d E [ V r (X ld ( s ( d ))) ] < oo, since nid = 0(d c ) with c £ (0, |) we can easily conclude. □ 
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B Proofs for Section [3] 

There are related results to Theorem 13.11 (see e.g. |46[ [50]). however in our case, the proofs will be based on 
assumptions commonly made in the MCMC and SMC literature, which will be easily verifiable. The general 
framework will involve constructing a Martingale difference array (an approach also followed in the above mentioned 
papers). 

Proposition B.l. Assume J^j and g £ Jzfyr with r 6 [0,-^). The family of functions {fs}se[<j} .i] 

specified as: 

Vs(x) = k s (g 2 )(x) - {k s (g s )(x)} 2 , g s = T>(g, k s , tt s ) , 
satisfies conditions (i) and (ii) of Theorem \A.l\ for f = 2r G [0, 1). 

Proof. Lemma fA-lT i) gives that |g s (a;)| < M \g\v r V r (x). Thus, due to the presence of quadratic functions in the 
definition of </? s (-) we get directly that |v 3 s(a;)| < MV r (x) so condition (i) in Theorem [Aj] is satisfied. We move on 
to condition (ii) of the theorem. Let us first deal with: 

{k t (g t )(x)} 2 - {k s (g s )(x)} 2 

which is equal to 

{kt(dt)(x) - k s (g t )(x)}{kt(g t )(x) + k s (g t )(x)} + {k s (g t - g s )(x)}{k s (g t +g s )(x)} . 

The terms with the additions are bounded in absolute value by MV r (x), whereas: 

| k t (g t )(x) - k s (g t )(x) \ < M\t — s\ V(x) r , | k s (g t - g s )(x) \ < M \t - s\ V{x) r , 

the first inequality following from assumption (J^f and the second from Lemma lA.ll ii). Thus, we have proved: 

| {h{g t ){x)} 2 - {k s {g s )(x)} 2 \<M\t-s\ V{xY 

for f = 2r E (0, 1). We move on to the second term at the expression for ip s and work as follows: 

k t (g 2 )(x) - k s {g 2 s ){x) = k t (g 2 )(x) - k s (g 2 )(x) + k s {g 2 ){x) - k,ffi)(x) . 

The first difference is controlled, from assumption (A[2]) , by M \t — s\ V(x) r , whereas for the second difference we 
use Cauchy-Schwarz to obtain: 

\k s {g 2 )(x) - k s (g 2 s )(x)\ < {k s {g t -g s ) 2 (x)Y/ 2 {k s (g t +g s ) 2 (x)Y/ 2 
< M \t - s\ V(x f 

where, for the second inequality, we have used Lemma I A. If ii). The proof is now complete. □ 

Proof of Theorem \3.1\ We adopt the decomposition as in Theorem IA.1I Set g s to be a solution to the Poisson 
equation (with 7r s , k s ) and <? n -i,ci = 5{s=<£„_i}- The decomposition is then: 

i d (t) 

^{g{X n _i(d)) - 7r„_ M ( 5 )} = Ma:i d (t)~i + Roa d (t)-i 

n=l 

where 

i d (t)-i 

M Q a d {t)-i= {9n,d( x n( d )) - k n . d {g n ^ d ){X n -i(d))} . 

n=l 

It is clear, via Theorem I A. 1| that Roa d (t)-i/ Vd goes to zero in Li and hence we need consider the Martingale array 
term only. 
Writing 

in.d = 9n,d{X n {d)) ~ k n ,d(g ni d)(Xn-i(d)) 

one observes that {£, n ,d, ^n,d}'^Xi with ^n,d denoting the filtration generated by {X n (d)}, is a square-integrable 
Martingale difference array with zero mean. In order to prove the fCLT, one can use Theorem 5.1 of [5] which gives 
the following sufficient conditions for proving Theorem 13. II 
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a) For every e > 0, J e><j := ± Y%=i E l^d^^eVd I ^n-i.d] ^ in probability. 

b) For any t G [(f> , 1], := a En=i E iC,d I ^n-i.d] converges in probability to the quantity u\ Q J{\ - O ) 2 . 

We proceed by proving these two statements. 

We prove a) first. Recall that r G [0, J), so we can choose 6 > so that r(2 + S) < 1. In the first line below, 
one can use simple calculations and in the second line Lemma lA.lf i) and the drift condition with r(2 + S) < 1, to 
obtain: 

|6^| 2+5 < M(5)(|p„, d (X„(d))| 2+5 + |fc n ,4?n,d)(Xn-l(d))| 2+5 ) 

<M(<y)(^(jf n (d)) + v(x n _i(d))) , 

Thus, using Lemma |A. If iii) we get: sup„ d E [ |£„.<j| 2+d ] < oo . A straightforward application of Holder's inequality, 
then followed by Markov's inequality, now gives that: 



d s 



n=l 



Thus, we have proved a). 
For b), we can rewrite: 



i d {t) 



i av ' 



n— 1 



k n,d{dl,d)( X n-l(d)) - {fc„ id (g„ id )(X n _i((i))} 2 



(42) 



We will be calling upon Theorem lA.il to prove convergence of the above quantity to an asymptotic variance. Note 
that, via Proposition lB.il the mappings 

ip 8 := k s (g 2 s ) - {k s (g s )} 
satisfy conditions (i)-(ii) of Theorem lA.il We define tp n ,d = <P{s=<j> n {d)} an d rewrite Idit) as: 

I d{ t ) = 3 ^ (Pn+l.d{X n {d)) . 
n— 

We also define: 

id(t)-i 



Jd ^ = 1 <Pn,d( X n(d)) ■ 



d 

n=0 



Due to condition (ii) of Theorem I A. 1( we have that Id(t) — Jd(t) — > in Li. Applying Theorem I A. II one can deduce 
that: 

i d (t)-i 

lim { J d (t) - - y~] TTnXVn.d) } = , in Li . 

n— 

Now, s i — ^ TT s (tp s ) is continuous as a mapping on [</>q, 1], so from standard calculus we get that 53n=o n n,d( l Pn.d) 
J, ir s ((p s )ds. Combining the results, we have proven that: 

I d {t) -> (1 - / 7r.(^ a )dfl = ^ o:t /(l - <M 2 . in Li . 

Note that by Corollary 3.1 of Theorem 3.2 of [35] we also have an CLT for Si. 

□ 

C Proofs for Section H] 
C.l Results for Proposition 14.11 

We will first require a proposition summarising convergence results, with emphasis on uniform convergence w.r.t. 
the time index. 



23 



Proposition C.l. Assume (J$l$^). Let s(d) be a sequence on [</>o?l] such that s(d) — > s. Then: 

i) su Vtels(d),i] E U S s(d):t,j > 0- 

H) sup te[s(d)il] | E[S 2 s{dytj } - a% t | -s- 0. 

mj sup te[s(d)il] |E[5 s(d):tJ ] | -> 0. 

W sup d > l se[s(d) t] E [5^. t ] < oo, /or some e > 0. 

Proof. For simplicity, we will omit reference to the co-ordinate index j. Applying the decomposition of Theorem 
lA.ll for ip s = g and uq = gives that: 

S s (d):t = ( M i d (s):(i £i (*)-1) + i? i ti (5):(i ti (t)-l)) 

with (choosing p = 2 + e for e > so that rp< 1): 

E[|^dW:(id(*)-i)| 2+e ] < Md 1+ ^ E[V(X )} , 
and (choosing p = 2 + e for e > so that rp < 1): 

E[|^„(.):(J,(t)-i)| 2+e ] <ME[V(X )} . 

One now needs to notice that these bounds are uniform in s, t, d, thus statements (i) and (iv) of the proposition 
follow directly from the above estimates; statement (iii) also follows directly after taking under consideration that 
E [ Mj d ( s ).(j d ( t )_i) ] = 0. It remains to prove (ii). The residual term Ri d ( s ):(i d (t)-i)/Vd vanishes in the limit in 
L2+ c -norm, thus it will not affect the final result, that is: 

sup | E [ S\ d) , t } - li^ll E [ Mj ^.j, ] | . 
te[«(d),i] 



Now, straightforward analytical calculations yield: 



2 E l M dM<d)):(l d (t) 



-i)] = 2 £ n{9n{X n )-k n {g n ){X n ^)} 2 ] 

n=l d (s(d)) 

id(t)-2 

= E[- £ V«+l(-Xn)] , 



n=J d ( S (d))-l 



where we have set: 



tp s = k s {gl) - {k s (g s )} 2 ; ip n = <P{ s=lj>n } . 
Since |<Pn+i — </ 3 n|v 2r < from Proposition lB.il we also have: 

1 id(t)-2 Jd(t)-2 

sup E[- V ^ n+1 (X„)]-E[- V tp n {X n )] 

1 w ' J n=Z d (s(d))-l 

Now, Theorem lA.il and Proposition IB. II imply that 

ld(t)-2 

d 



n=l d (s{d))-l 



-> . 



sup E 

te[s(d),i] 



1 



n=i d (s(d))-l 



Finally, due to the continuity of s i— > 7r s (y> s ), it is a standard result from Riemann integration (see e.g. Theorem 6.8 
of [53]) that: 



sup 

te[s(d),i] 



/5 



n=i d (s(d))-l 







and we conclude. 



□ 
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Proof of Proposition ^. 1] For some sequence s(d) in [</>o,l] such that s(d) — > s, we will consider the function in 
te [s(d),l]: 

= ^[expl^E^^M}] / E 2 [exp{j-S 8(d);M }] \ d 

the second result following due to the independence over j. In the rest of the proof we will omit reference to the 
co-ordinate index 1. Due to the ratio in the definition of fd(s(d),t), we can clearly re-write: 



fd(s(d),t) 



E 2 [ex P {^5 s(d):f }]V 



E[ex P {^S s(£i):t }] 



for S s (d)-.t = Ss(d)-.t ~ E[5 s (rf) :t ]. We will use the notation l hd{t) — >t h(t)' to denote convergence, as d — > oo, 
uniformly for all t in [s(gQ,1], that is sup t6 [ s( - d ) t ] \hd(t) — ht\ — > 0. We will aim at proving, using the results in 
Proposition IC.ll that: 

f d (s(d),t) -> t , (43) 

or, equivalently, that sup i6 r s ( rf ) u \ fd(s(d),t) — e~°" s:t | — > 0, under the convention that cr 2 ( = for t < s. Once we 
have obtained this, the required result will follow directly by induction. To see that, note that for proving that 

_ 2 

ti(d) — s- t\ we will use the established result for s(d) = 0o : uniform convergence of fd\4>Qi't) to e °*o : * together 

2 

with the fact that e -cr *° :t is decreasing in t will give directly that the hitting time of the threshold a for fd(4>o,t) 

_ 2 

will converge to that of e °*o :t . Now, assuming we have proved that t n (d) — > t n , we will then use the established 
uniform convergence result for s(d) = t„(d) to obtain directly that t n +\(d) — > t n +\. 

We will now establish (|43|) . Note that we have, by construction: E[S l s ( ( j):t] = . We use directly Taylor 
expansions to obtain for any fixed t £ [s(d), 1]: 

e *^> * = 1 + S s{d y, + \ S 2 s(d y,e 2 ^ ; (44) 
e ^ Hd) , =i+i 5 s(d):t + i sj d):t , (45) 



where Q it , (' d t e [ ^- S^-.tAO , ^S , s ( ( j) :t V0 ] . Note here that since g is upper bounded and sup nd E [ \g(X ni i(d))\ ] < 
oo, we have that -^S^dy.t i s upper bounded. Thus, we obtain directly that: 



Ut < M , C' d ,t < M ; |C d ,t| + lCd.il < M S s{d y.t 

Taking expectations in (|4"4"1) : 

U? r „775^s(d):t 1 1 | 2 TP [I 

l S (d):t ' 



E[e^ Ss(d):t ] = 1 + j}E[S^- - 2Cd - 



Now consider the term: 

a d {t) ■= E [S 2 s(d y.t e 2CdA \ = E [S* (d);t ] + E [s' (d):t (e 2 ^ - 1) ] . 

Using Holder's inequality and the fact that E[|e 2 ^ d t — < M(q) E [ |Cd,t| ] f° r an Y Q > lj via the Lipschitz 
continuity of x n- |e 2:E — 1| 9 on (— oo, M], we obtain that for e > as in Proposition IC . 1 T in) : 

\ns] (d y, (e 2 °^ -1)]|<E* [3^ :t ] [|e 2 ^ - 
< ME* [|Cd,t|] ->t 

the last limit following from Proposition IC.lV iV Thus, using also Proposition ICll fii)-(iii). we have proven that 
a d (t) — >t °~ 2 ;t ■ Note now that: 

| (1 + § a d {t)) d - (1 + 2 4-) d \ < M \a d (t) - a 2 ,\ ; (l + e 2 ^ , 

the first result following from the derivative of x i->- (l + being bounded for x £ [0, M]. Thus we have proven 

2 

that: (E[eVrf B(d) ' ] ) — > ( e 2tTs:t . Using similar manipulations and the Taylor expansion (|45|) we obtain that: 

(E 2 [e^3^ (d):t ]) d -> t e CT -« . 

Taking the ratio, the uniform convergence result in (|43[) is proved. □ 
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C.2 Results for Theorems [471] and [4721 



To prove Theorems 14 . 1 1 and !4 . 2 1 we will first require some technical lemmas. Here the equally weighted c?-dimensional 
resampled (at the deterministic time instances tk(d)) particles are written with a prime notation; so ^-i^(t k (d)) j wu ^ 
denote the j-th co-ordinate of the i-th particle, immediately after the resampling procedure at tj~{d). 

Proposition C.2. Assume (A]](i)(ii)) and let k <= {f , . . . , m*}. Then, there exists an M{k) < oo such that for any 
N>l,d>l,ie{l,...,N},je{l,...,d}: 

nV{X[% h{d)) .)]<M{k)NK 

Proof. We will use an inductive proof on the resampling times (assumed to be deterministic). It is first remarked 
(using Lemma lA.l[) (iii)) that for every k £ {I, . . . ,m*}: 

nV(Xl {tk[d)) J | ^f_ i(d) } < MF(4_ i(4] ) (46) 

where i(d) * s ^ ne m t ra tion generated by the particle system up-to and including the (k — \) th resampling time 
and M < oo does not depend upon tk(d), tk-\{d) or indeed d. 

At the first resampling time, we have (averaging over the resampling index) that 

N 

E[V(X'^ {ti{d)) j ) | = X]^d(*i(d))( :!;; L(to(d)):i <i (ti(d))-l) 1/ ( a; L(ti(d))j) 

i=l 

where ^^i d s is the filtration generated by the particle system up-to the fst resampling time (but excluding re- 
sampling) and wi d (t 1 (d)){ x \ d (t {d))-i d {t 1 {d))-i) ls ^ ne normalized importance weight. Now, clearly (due to normalised 
weights be bounded by 1): 

and, via (|4T)|) , E [V^AT^^^ ^) ] < NM which gives the result for the first resampling time. 

Using induction, if we assume that the result holds at the (k — l) th time we resample (fc > 2), it follows that 
(for ^^_/ d \ being the filtration generated by the particle system up-to the fc-th resampling time, but excluding 
resampling): 

N 

¥.[V{X' ld l {tk{d)) j ) | &t k ( d ) ] = ^ :nj i d (t k (d))(xi d (t k _ 1 (d))-.i d (t k (d))-i) v i x \ d (t k (d)),j) 

i=i 

N 

< ^2v{x\ d{tk{d)) j ) . 

i=l 

Thus, via (|46[) and the exchangeability of the particle and dimension index, we obtain that 

The proof now follows directly. □ 

Proposition C.3. Assume AM- Let <P e r e [0, §). Then for any fixed N , any k € {f , . . . , m*} 

and any i € {1, . . . , N} we have 

1 d 

~ d T,^ x i d \t k (d)hj)-^ 7r M . m L i • 
3=1 

Proof. We distinct between two cases: k = 1 and k > I. When k = 1, due to the boundedness of the normalised 
weights and the exchangeability of the particle indices we have that: 

-. d d 

E Id E v&wimJ ~ K * ^ I ^ NE I d E v(K(u( d)) ,) - (*0 I (47) 

3=1 3=1 
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Adding and subtracting the term E [vC^^fd)) j) ] we obtain that the expectation on the R.H.S. of the above 
equation is bounded by: 

E I ? E ~ E M x UmJ 1 1 + 1 E («o)j)l ~ ^ (v) I • ( 48 ) 

For the first term, due to the independency across dimension, considering second moments we get the upper bound: 

-LE^[(^ (ti(d))i .)-E[^ (ti(d))) ,)]) 2 ] • 

As (p <G Jzfyr with r < 1/2 the argument of the expectation is upper-bounded by MVpf^^^ whose expectation 

is controlled via Lemma [A.ll iii). Thus the above quantity is O^dr 1 / 2 ). For the second term in (|48|) we can use 
directly Proposition IA.1I (for time sequences required there selected as s(d) = <j)Q and t(d) = t 1 (d)) to show also 
that this term will vanish in the limit d — > oo. 

The general case with k > 1 is similar, but requires some additional arguments as resampling eliminates the 
i.i.d. property. Again, integrating out the resampling index as in (|47[) we are left with the quantity: 

1 d 

Adding and subtracting ^ Sj=i [^^/<i(tfe(<i)) 1 w ithin the expectation, the above quantity is upper 

bounded by: 

For the first of these two terms, due to conditional independency across dimension and exchangeability in the 
dimensionality index j, looking at the second moment we obtain the upper bound: 

Since |y(x)| < M V r (x) with r < |, the variable in the expectation above is upper bounded by M (V (X^,^,^ .) + 
V(X' l '^ tk ,,,.)) which due to Proposition IC.2I is bounded in expectation by some M(N,k). Thus, the first 
term in flU is 0{d~ 1 / 2 ). The second term in ()49l) now, due to exchangeability over j, is upper bounded by 
E | E x /,i [ y(^ d (t,.(d)) j) ] ~ n t k (f) | j which again due to Proposition I A. II vanishes in the limit d — > oo. □ 

For the Markov chain JQ „■ considered on the instances ni < n < no we will henceforth use the notation 
E Wa [<?(A'^ J ) ] to specify that we impose the initial distribution X^ ■ ~ 7r s . 

Proposition C.4. issume f42HiP tfwrf .9 6 -&v with r e [0, |). For fc G {1,. . . , m*}, i G {1, . . . , N} and a 

sequence Sk(d) with Sk(d) > tk—i(d) and Sk(d) — > Sk > tk—i we define: 

Eij = E ( E x'>' . . 1 " ^T*fc_i [9(K,j) ] ) > 1 < i < d . 

n 

/or subscript n in the range ld{tk—i{d)) <n< ld{sk{d)) — 1. Then, we have that: 

1 d 

— £7jj — > , in Lj . 
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Proof. We will make use of the Poisson equation and employ the decomposition ([55)1 used in the proof of Theorem 
IA.1I In particular, a straight-forward calculation gives that: 

II 2 

TL — TLl + 1 

+ (El,!,,' _E Tt fc -i )[.9(^n 2 j) -?n 2 (^n 2 ,i)] + ?„ 1 (AT„ 1J ) - W tk _^n^) , (50) 

where g n = V{g, fc„, 7r„), and we have set: 

«i = ld(tk-i(d)) ; n 2 = l d (sk{d)) - 1 ; X ni j = x i *( tk _ 1 ( d )), 3 • 

It is remarked that the martingale term in the original expansion (|36p has expectation 0, so is not involved in 
our manipulations. We will first deal with the sum in the first line of (|50p. that is (when taking into account the 
averaging over j) with: 

d n 2 

A d :=-E E ^xn^i - 7r * fc -i](( fc «i+i:«)[5n-?«-i]) ■ 

j — 1 n-ni+l 

Now each summand in the above double sum is upper bounded by 



M 



-J II [Sx nilj - Tt fe -i](fcni+l:n) 



To bound this y-norm one can apply Theorem 8 of [32]; here, under (AQ~|[3]) we have that either: 

II [Sx nitj - ^]{k ni+ x:n) \\vr < Mp n -^V(X nuj ) r + M'C n ~ ni (51) 
for some p, £ £ (0, 1), < M, M' < oo, when Bj_\^ n (of that paper) is 1. Or, if Bj_i >n > 1, one has the bound 

II [Sx nuj - Tr^Kfc^+i^llyr < Mp^^ n ^V{X nuJ Y + M'Q Vr[n - n ^ 

with j* as the final equation of [321 pp. 1650]. (Note that this follows from a uniform in time drift condition which 
follows from Proposition 4 of |32| (via (A[T|))). By summing up first over n and then over j (and dividing with d), 
using also Proposition IC . 31 along the way to control J2j V(X ni ,j) r /d, we have that: 

Ad — > , in Li . 

A similar use of the bound in (|5"Tj) and Proposition IC.3I can give directly that the second term in ([50]) will vanish 
in the limit when summing up over j and dividing with d. Finally, for the last term in (|50[) : Proposition IC.3I is 
not directly applicable here as one has to address the fact that the function g ni depends on d. Using Lemma TA. II 
(ii), one can replace g ni = ff; d (t fc _ 1 (rf)) by gt k _! and then apply Proposition IC.3I and the fact that tk-i(d) — > tk-\ to 
show that the remainder term goes to zero in Li (when averaging over j). The proof is now complete. 

□ 



Proof of Theorem \4-l\ Recall the definition of the ESS: 



VSS {tk _ l{d - )iSk{d - ) )(N) 



where we have defined: 



with: 



a 1 (d) = ±-jr{G l ,3+E l , J } 



d 

3=1 



n 

= E { ®x;< (d)) . l9«j) I - ^*-t [<?«,•) } } , 

l d<- t k-l K d ))>3 
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for subscript n in the range ld(tk-i(d)) < n < ld(sk{d)) — l. From Proposition lC.4l we get directly that Y^,j=i Eij/d — > 
(in Li). Thus, we are left with Gi.j which corresponds to a martingale under the filtration we define below. In 
the below proof, we consider the weak convergence for a single particle. However, it possible to prove a multivariate 
CLT for all the particles using the Cramer- Wold device. This calculation is very similar to that given below and is 
hence omitted. 

Consider some chosen particle i, with 1 < i < N. For any d > 1 we define the filtration Qq ^ C Q 1 d C • • • C Q^d 
as follows: 

So4 = ^K'!( tk . 1 ( d)) ^<j<d,l<l<N)-, 

gj,d = Gj-i4y °{K,vHtk-i{d))<n<l d {s k {d))-l) , j>l. (52) 

That is, CT-algebra C/o,d contains the information about all particles, along all d co-ordinates until (and including) 
the resampling step; then the rest of the filtration is build up by adding information for the subsequent trajectory 
of the various co-ordinates. Critically, conditionally on (?o,d these trajectories are independent. One can now easily 
check that 

1^ 



k=l 



is a martingale w.r.t. the filtration in (|52p . Now, to apply the CLT for triangular martingale arrays, we will show 
that for every i 6 {!,..., N}: 



a) That in L x : 



b) For any e > 0, that in Li: 



d->oo d 2 



1 d — 

J™ ^E E [ G « I |5wl>e«il^-M]=0. 



d— >oo d 

3=1 

This will allow us to show that (1 — </>o)a l (d) will converge weakly to the appropriate normal random variable. 
Notice, that due to the conditional independency mentioned above and the definition of the filtration in (|52|) we in 
fact have that: 

E[Gj,|^- M ]=E x , % [<£,-]; 

2 2 

We make the following definition: 

= E {d( X n,j) _ n n(g)} = M ni: „ 2j i,j + R ni :n 2 ,i,i , 
n 

(for convenience we have set m = ld(tk—i{d)) and ni = l d {sk{d)) — 1) with the terms M ni:ri2 ^j and R ni :n 2 .i,j 

defined as in Theorem IA. II with the extra subscripts indicating the number of particle and the co-ordinate. Notice 

that Gi , = d n — E y /,i [Gij]. 

A 'd(' fc -i(<i)),i 



We start with a). We first use the fact that: 
d 



d 1 d 

-Ve y m [Zf ]-— VE^ [G?,]->0, in U 



To see that, simply note that the above difference is equal to: 

d d 



U . id(t fc _i(d)),j U . !d(tfc_l(<*)).J U . 
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where we first used the fact that M ni: „ 2i jj is a martingale (thus, of zero expectation) and then Theorem IA.1I to 
obtain the bound; the bounding term vanishes due to Proposition IC. 21 We then have that: 



^ d ^ d 

— VE Y ,,, [Gj ,] = -VE x ,., [M? i + F$ i + 2M iJ R id ] 

-?t«^,^t^l + ^,. (53) 



To yield the O^d' 1 / 2 ) one can use the bound 

from Theorem lA.il and then (using Cauchy-Schwartz and Theorem lA.l[) : 

|E [MijiZijJI^E^ [M^l-E^ [^.; 

i d {t k -iW),j ^i d (t k _ 1 ( d )),j A i d (t k „ 1 ( d )),j 



<AfVdV(X'A t ,) 2r 

— v l d (tk-l(d)),J> 



One then only needs to make use of Proposition IC.2I to get ([53")) . Now, using the analytical definition of Mij from 
TheoremlA.il we have: 



d n 2 



j=l j — 1 n—n\ +1 

^ d n 2 — 1 

^EE^J^K.)!-^ (54) 

where: 

¥>« = knidl) ~ [kn(dn)} 2 l 9n = V(g, k n ,7T n ) . 

Using again the decomposition in Theorem IA.11 but now for ip n as above (which due to Proposition IB.ll satisfies 
the requirements of Theorem IA.1[) . we get that: 

ri2 — 1 

I *X[% . [ E Vn + l(XL d ) - | = | E x[ , [ K^-X) ,ij ] \ 



n—ni 

<MV 2r (X'') t ,,,,.). 
Thus, continuing from (|54[) . and using the above bound and Proposition IC . 2\ we have: 

712— i 

d 



1 " 2 ~ 1 

Ad - E ^(^n+l) | = 0(0 • (55) 



The proof for a) is completed using to the deterministic limit: 

T12 — 1 



1-00 



d 

n— n± 



V 7T„(</? n+ i) -> / n u (gl - k u (g u ) 2 )du 
Jtk-i 



For b), we choose some 8 so that r(2 + £) < 1, and obtain the following bound: 



[Gty]<ME x ,,i [G 2 + s ] 
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where for the last inequality we used the growth bounds in Theorem lA.il Also using, first, Holder inequality, then, 
Markov inequality and, finally, the above bound we find that: 

V(X''\ ,,,, ) r5 d & l 2 
< MV(X''\ , n , f r d ■ ' d(tfc 7 l( f» J . 

Thus, we also have: 



Due to Proposition IC. 21 this bound proves part b). 



□ 

Proof of Theorem \4-.2\ The proof is similar to that of Theorem 14. II (as the final resampling time is strictly less than 
1) and Theorem I3.3| it is omitted for brevity. □ 

C.3 Stochastic Times 

Proof of Theorem \4-S\ Our proof will keep d fixed until the point at which we can apply Theorem 14. II Conditionally 
on the chosen {a^} we have: 

m*(5) 

P[Sl\Q%]< £ £ P[\±ESS {ti _ iidhs) (N)-ESS {ti _ i{dhs] \ >v|ESS (tLi(d) , s) -a fe |] . 

Define 

e(d) := infinf | ESS {t «_ i(d)iS) - a k | ; 
we remark lim^-ycc = e > (with probability one). Hence we have: 

m* (5) 

Application of the Markov inequality yields that: 

P [ \ ] < maxE [ | * ESS (tS _ lW)l .) (JV) " ESS(tjt_ lW >,.) I ] ■ 

Since k, s lie in a finite set and e > 0, we need only deal with the expectation as d grows. Note, in the expectation, 
the case s = t s k (d) is not of interest; ESS is constant and hence lower-bounded all other cases. 
Application of Theorem 14.11 now yields: 

Jim E [ |iESS (tLi(d)W (iV) - ESS (tLi(d))S) ] = E [ | ± ESS (t j_ iia) (N) - ESS (t j_ ii5) | ] 

where ^ 

ESS (tLi>s) (iV) = ( ^r lCXP |f;l! i E SS(^ llS) =exp{ -4 s } , 



E v =1 cxp{2X fe } 



with X k i ~ d ' Af(0, a\ . ). We set: 



^=exp{X 7 fe }; ^=exp{2X fe }; a k = expU a, 2 5 .J; f3 k = cxp{2a 2 } 



Then, we are to bound: 

E 
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We have the decomposition 



AT l^] = l 



N 



3 = 1 



1 

k 



N £-~i ' J Rk y AT L^i 1 1 



N 



, fc\2 



3 = 1 



For the first term of the R.H.S. in the above equation, as ESS divided by N is upper-bounded by 1, we can use 
Jensen and the Marcinkiewicz-Zygmund inequality. For the second term, via the relation x 2 — y 2 — (x + y)(x — y) 
and Cauchy-Schwartz, one can use the same inequality to conclude that for some finite M(k, 5, s): 



E[|^ESS (t j_ i) . ) (JV)-ESS (t j_ i) , ) |] < 
Thus, we have proven that: limd->oo Fffi \ ^ ] < M ^'^ S ^ as required 



M(k,S,s) 



N 



□ 



D Verifying the Assumptions 

Proof of Provosition \5. 1\ We start with (A[T])(i)-(ii); to establish uniform (in s) drift and minorization conditions 
for the kernel k s . The proof is standard and included for completeness. 
It is first noted that, for any 5 q > 0, if \x — y\ < 6 q : 



a 1 / 2 

q s {x,y) > —==cxp 

V Z7T 



,1/2 

S 2 [> ^exp 



s 

2" q 



/2tt 



2*« 



(56) 



This property will be used below. To establish the minorization, one can follow the proof of Theorem 2.2 of [5D] to 
show that for any x, with y € B(x, S q /2) (the open ball, centered x and of radius 5q/2), A € A C B(x, S q /2) 



k s (y,A) > 7](x,6 q /2) / (q s (z, y) A q s (y, z))dz > r](x, 5 q /2)e q / dz 

J A J A 

where rj(x,S q /2) = ia£ x eB(ai,6 II) 7r i( a; )/ll 7, V>olloo & n d S q is as (|56p . e q as the RHS of the inequality in (|56p . Hence, 
we have the uniform minorization condition. 

To prove the drift, we do not require it hold for s = 4> as, in the algorithm, we sample exactly from tv$ . None- 
theless, by our assumptions there exist a drift condition for k$ (a symmetric normal random walk Metropolis-kernel 
of invariant ir^); write the parameters A, b. Now, for any s G (cpQ, 1], via Lemma 5 of [3] and using that for any 



x,y > q^o&y) S V$, 



where 



one has 



k B (V)(x) < -=(fc^(V)(x) - V{x)) + V(x) 



V(x) = \\e+°9\\lf/e%°W. 



(57) 



Now one can easily find ace [(1 — <f) ^ 2 ) A (—X/^/tfio), 1 — A^ such that k s (V)(x) < XV(x) + bl c {x) with 
A e (0, 1), b < oo. Hence, the uniform drift condition is verified. (jAQJ (iii) can be verified in a similar manner to 
e.g. [32] and is omitted. 

Now to ( A[2j) , which is a little more complex. Recall, we want to establish that there exist an M < oo such that 
for any s,t € ((f>o, 1], — kt\\\v < M\s — t\. For simplicity, we will consider only the increment of proposal (via 
change of variables), so q s is a zero mean normal density, with variance 1/s. For any fixed x £ M q s is a bounded- 



continuous function of s G [(j>o, 1] and further, the first derivative w.r.t. s is upper-bounded by 
it follows that for any iff, s,t £ [cf>o, 1]: 

\q s (x) - q t {x)\ < 



2V27T0Q 



1 



2y/2n(j>Q 



e )\s-t\ 



i>ox /2 hence 



(58) 



Now central to our proof is the consideration of the acceptance probability, which is a s (x, z) = 1 A exp{s(g(x + z) — 
g(x))} . Let 

A(x) = {z : g(x + z) - g(x) > 0} (59) 
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then if z £ A(x), a s (x,z) = 1. We begin by considering the acceptance part of the kernel. The difficult issue is 
when z £ A{x) c which is dealt with now: 

/ <p(x + z) exp{— sG{x, z)}q s (z)dz — / ip(x + z) exp{— tG(x, z)}q t (z)dz (60) 

J A(x)<= J A(x) c 

where ip £ Jtfy- Now for any fixed x, z £ A(x) c one has that 



\s-t\ (61) 



| exp{-sG(a;, z)} - exp{-tG(x, z)}\ < (g(x, z)e"^ G ( a: ' z )J \i 
for every s,i £ [<pQ, 1]. Then, returning to (|60p . it can be decomposed into the sum of 

ip(x + z)[exp{— sG(x, z)} — exp{— tG(x, z)}]q s (z)dz (62) 



and 

/ tp(x + z)cxp{-tG{x, z)}[q s (z) -q t {z)]dz . (63) 
First consider ([62)) . Applying (|6T|) . it follows that ([62]) is upper-bounded by 

CMv\s-t\ [ e-^ 9(x+z) G{x,z)e-^ x ^q s {z)dz 

J A(x)" 

where C^ Q is associated to the Lyapunov function (|57|) . Now as e~^ g( -- x+z ^ e~^ G ^ x ' z ^ = e - ^ 9 ^' which is controlled 
by V(x) and by assumption (pt0"|) . j A , x y G(x, z)q s {z)dz is dealt with; hence (|52")l divided by V(a;) is upper-bounded 
by C0 o |<^|y|s — t\C* . Our next task is (|63[) . Applying (|58p. it is upper-bounded by 

CMv I e-^ x ^e- tG ^\ S ~t\-^=e^ 2 / 2 dz = 

CMve-^ { e^^\ S ~t\-^=er^/2 dz . 

on dividing by the Lyapunov function, we are to deal with the expression exp{-(t- Q)G(x,z)}. Now, t > </>o/2 
and for any x, z £ A(x) c , one has that G(x, z) > hence this latter expression is upper-bounded by 1. This leaves 
the term J A ^ X \ C e~^° z l 2 dz which is finite. Hence, putting together the above arguments, we have shown that 

there exists an M < oo such that for any s, t £ [</><j, 1], x £ M one has 

tp(x + z)e- sG ^ x ^q s (z)dz - [ <p(x + z)e- tG ^ x ^q t (z)dz\ / V(x) < M\s - t\ , 
A{xY J A(x)° 



where we have applied ([5811 . 

Turning to the acceptance part of the kernel on A(x), we have 

<p{x + z)[q s {z) - q t (z)]dz < C^\<p\ v [ V(x + z)\s - t\ L^ e^^dz . 

A(x) JA(x) Zy/ZTTCpQ 

As V(x + z) < V(x) on A(x), it follows that the term of interest is upper-bounded by M\s — f|V^(x) for some 
M < oo. Hence the acceptance part of the kernel, divided by V, is upper bounded by M\s — t\. In the rejection 
part of the kernel, we have to control: 

<p(x) / [a t (x, z) - a t (x, z)]q t (z)dz + / [qt(z) - q s (z)]a s (x, z)dz + / [q t (z) - q s (z)]dz 

_JA(xY JA(xY JA(x) 

Now, as ip is controlled by V , we need to consider the continuity of the terms in the bracket. The latter two terms, 
via ([55)1 . are upper-bounded by M\s — t\. The first term is upper-bounded by \s — t\ f A / x \ c G(x, z)e~^°l 2G{ - x ' z) q t {z)dz 
using (fBT). As e ~<t>o/2G(x,z) <• ^ we can uge (j3Q|) ^ complete the argument. □ 
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